SAS/ETS 9.22 ®
User’s Guide
®
SAS Documentation
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2010. SAS/ETS® 9.22 User’s Guide. Cary, NC: SAS Institute Inc. SAS/ETS® 9.22 User’s Guide Copyright © 2010, SAS Institute Inc., Cary, NC, USA ISBN 978-1-60764-543-6 All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513. 1st electronic book, May 2010 1st printing, May 2010 SAS® Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228. SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.
Contents I General Information Chapter 1. Chapter 2. Chapter 3. Chapter 4. Chapter 5. Chapter 6.
What’s New in SAS/ETS 9.22 . . . Introduction . . . . . . . . . . Working with Time Series Data . . . Date Intervals, Formats, and Functions SAS Macros and Functions . . . . Nonlinear Optimization Methods . .
1 . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
II Procedure Reference Chapter 7. Chapter 8. Chapter 9. Chapter 10. Chapter 11. Chapter 12. Chapter 13. Chapter 14. Chapter 15. Chapter 16. Chapter 17. Chapter 18. Chapter 19. Chapter 20. Chapter 21. Chapter 22. Chapter 23. Chapter 24. Chapter 25. Chapter 26. Chapter 27. Chapter 28. Chapter 29. Chapter 30. Chapter 31. Chapter 32. Chapter 33. Chapter 34.
The ARIMA Procedure . . . . . . . The AUTOREG Procedure . . . . . . The COMPUTAB Procedure . . . . . The COUNTREG Procedure . . . . . The DATASOURCE Procedure . . . . The ENTROPY Procedure (Experimental) The ESM Procedure . . . . . . . . The EXPAND Procedure . . . . . . The FORECAST Procedure . . . . . The LOAN Procedure . . . . . . . The MDC Procedure . . . . . . . . The MODEL Procedure . . . . . . . The PANEL Procedure . . . . . . . The PDLREG Procedure . . . . . . The QLIM Procedure . . . . . . . . The SEVERITY Procedure (Experimental) The SIMILARITY Procedure . . . . . The SIMLIN Procedure . . . . . . . The SPECTRA Procedure . . . . . . The STATESPACE Procedure . . . . The SYSLIN Procedure . . . . . . . The TIMEID Procedure (Experimental) . The TIMESERIES Procedure . . . . . The TSCSREG Procedure . . . . . . The UCM Procedure . . . . . . . . The VARMAX Procedure . . . . . . The X11 Procedure . . . . . . . . The X12 Procedure . . . . . . . .
3 15 63 127 153 169
191 193 317 463 517 563 659 725 763 817 871 913 993 1309 1395 1421 1491 1589 1659 1689 1715 1761 1825 1849 1919 1933 2047 2227 2295
III Data Access Engines
2395
Chapter 35. The SASECRSP Interface Engine . . . . . . . . . . . . . . . . Chapter 36. The SASEFAME Interface Engine . . . . . . . . . . . . . . . . Chapter 37. The SASEHAVR Interface Engine . . . . . . . . . . . . . . . .
2397 2499 2555
IV Time Series Forecasting System Chapter 38. Chapter 39. Chapter 40. Chapter 41. Chapter 42. Chapter 43. Chapter 44. Chapter 45. Chapter 46.
2605
Overview of the Time Series Forecasting System Getting Started with Time Series Forecasting . Creating Time ID Variables . . . . . . . Specifying Forecasting Models . . . . . . Choosing the Best Forecasting Model . . . . Using Predictor Variables . . . . . . . . Command Reference . . . . . . . . . . Window Reference . . . . . . . . . . Forecasting Process Details . . . . . . .
. . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
V SAS/ETS Model Editor (Experimental)
2923
Chapter 47. SAS/ETS Model Editor Window Reference
. . . . . . . . . . . .
VI Investment Analysis Chapter 48. Chapter 49. Chapter 50. Chapter 51. Chapter 52. Chapter 53.
Overview . . Portfolios . . Investments . Computations Analyses . . Details . . .
2607 2611 2667 2681 2719 2739 2773 2781 2889
2925
2977 . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
2979 2983 2991 3035 3047 3063
Subject Index
3075
Syntax Index
3117
iv
Credits and Acknowledgments
Credits
Documentation Editing
Anne Jones
Technical Review
Evan L. Anderson, Ming-Chun Chang, Jan Chvosta, Brent Cohen, Allison Crutchfield, Paige Daniels, Gül Ege, Bruce Elsheimer, Donald J. Erdman, Kelly Fellingham, Sanggohn Han, Laura Jackson, Wilma S. Jackson, Wen Ji, Kurt Jones, Kathleen Kiernan, Michael J. Leonard, Li C. Li, Mark R. Little, Kevin Meyer, Gina Marie Mondello, Steve Morrison, Youngjin Park, Jim Seabolt, David Schlotzhauer, Rajesh Selukar, Jennifer Sloan, Mark Traccarella, Michele A. Trovero, Charles Sun, Donna E. Woodward
Documentation Production
Tim Arnold
Software The procedures in SAS/ETS software were implemented by members of the Advanced Analytics division. Program development includes design, programming, debugging, support, documentation, and technical review. In the following list, the name of the developer who currently has principal support responsibility for the procedure is listed first.
ARIMA
Rajesh Selukar, Michael J. Leonard, Terry Woodfield
AUTOREG
Xilong Chen, Jan Chvosta, Richard Potter, Jason Qiao, John P. Sall
COMPUTAB
Michael J. Leonard, Alan R. Eaton
COUNTREG
Jan Chvosta, Laura Jackson
DATASOURCE
Kelly Fellingham, Meltem Narter
ENTROPY
Xilong Chen, Arthur Sinko, Greg Sterijevski, Donald J. Erdman
ESM
Michael J. Leonard
EXPAND
Marc Kessler, Michael J. Leonard, Mark R. Little
FORECAST
Michael J. Leonard, Mark R. Little, John P. Sall
LOAN
Richard Potter, Gül Ege
MDC
Jan Chvosta
MODEL
Marc Kessler, Donald J. Erdman, Mark R. Little, John P. Sall
PANEL
Jan Chvosta, Greg Sterijevski
PDLREG
Xilong Chen, Richard Potter, Jan Chvosta, Leigh A. Ihnen
QLIM
Jan Chvosta
SIMILARITY
Michael J. Leonard
SEVERITY
Mahesh V. Joshi
SIMLIN
Mark R. Little, John P. Sall
SPECTRA
Marc Kessler, Rajesh Selukar, Donald J. Erdman, John P. Sall
STATESPACE
Donald J. Erdman, Michael J. Leonard
SYSLIN
Laura. Jackson, Donald J. Erdman, Leigh A. Ihnen, John P. Sall
TIMEID
Marc Kessler, Michael J. Leonard
TIMESERIES
Marc Kessler, Michael J. Leonard
TSCSREG
Jan Chvosta
vi
UCM
Rajesh Selukar
VARMAX
Youngjin Park
X11
Wilma S. Jackson, R. Bart Killam, Leigh A. Ihnen, Richard D. Langston
X12
Wilma S. Jackson
Time Series Forecasting System
Evan L. Anderson, Michael J. Leonard, Meltem Narter, Gül Ege
Investment Analysis System
Gül Ege, Scott Gray, Michael J. Leonard
Compiler and Symbolic Differentiation
Andrew Henrick, Stacey Christian
SASEHAVR
Kelly Fellingham
SASECRSP
Kelly Fellingham, Peng Zang
SASEFAME
Kelly Fellingham
Testing
Shu An, Ming-Chun Chang, Bruce Elsheimer, Kelly Fellingham, Sanggohn Han, Li C. Li, Jennifer Sloan, Charles Sun, Peng Zang
Technical Support
Members
Paige Daniels, Wen Ji, Kurt Jones, Kathleen Kiernan, Gina Marie Mondello, David Schlotzhauer, Donna E. Woodward
vii
Acknowledgments Hundreds of people have helped the SAS System in many ways since its inception. The following individuals have been especially helpful in the development of the procedures in SAS/ETS software. Acknowledgments for the SAS System generally appear in Base SAS® software documentation and SAS/ETS software documentation. David Amick David M. DeLong David Dickey Douglas J. Drummond Michel Ferland Susie Fortier William Fortney Wayne Fuller A. Ronald Gallant Phil Hanser Marvin Jochimsen Jeff Kaplan Ken Kraus Dominique Ladiray George McCollister Douglas Miller Brian Monsell Robert Parks Benoit Quenneville Gregory Sali Bob Spatz Mary Young
Idaho Office of Highway Safety Duke University North Carolina State University Center for Survey Statistics Statistics Canada Statistics Canada Boeing Computer Services Iowa State University The University North Carolina at Chapel Hill Sacramento Municipal Utilities District Mississippi R&O Center Sun Guard Center for Research in Security Prices INSEE San Diego Gas & Electric Purdue University U.S. Census Bureau Washington University Statistics Canada Idaho Office of Highway Safety Center for Research in Security Prices Salt River Project
The final responsibility for the SAS System lies with SAS Institute alone. We hope that you will always let us know your opinions about the SAS System and its documentation. It is through your participation that SAS software is continuously improved.
viii
Part I
General Information
2
Chapter 1
What’s New in SAS/ETS 9.22 Contents Overview . . . . . . . . . . . . . . . . . . . . . . Highlights of Enhancements . . . . . . . . . Highlights of Enhancements in SAS/ETS 9.2 AUTOREG Procedure . . . . . . . . . . . . . . . COUNTREG Procedure . . . . . . . . . . . . . . MDC Procedure . . . . . . . . . . . . . . . . . . . MODEL Procedure . . . . . . . . . . . . . . . . . QLIM Procedure . . . . . . . . . . . . . . . . . . SASEFAME Engine . . . . . . . . . . . . . . . . SASEHAVR Engine . . . . . . . . . . . . . . . . New SEVERITY Procedure (Experimental) . . . . SIMILARITY Procedure . . . . . . . . . . . . . . New TIMEID Procedure (Experimental) . . . . . . TIMESERIES Procedure . . . . . . . . . . . . . . UCM Procedure . . . . . . . . . . . . . . . . . . . X12 Procedure . . . . . . . . . . . . . . . . . . . SAS/ETS Model Editor Application (Experimental) Date Intervals, Formats, and Functions . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. .
. .
3 3 4 4 5 6 6 7 7 8 9 10 10 10 11 11 12 13
Overview This chapter summarizes the new features available in SAS/ETS 9.22. If you have used SAS/ETS procedures in the past, you can review this chapter to learn about the new features that have been added. When you see a new feature that might be useful for your work, turn to the appropriate chapter to read about the feature in detail.
Highlights of Enhancements The following new procedures have been added to SAS/ETS software:
4 F Chapter 1: What’s New in SAS/ETS 9.22
The SEVERITY procedure (Experimental) The TIMEID procedure (Experimental) The SIMILARITY procedure, which performs similarity analysis for sets of time series, was experimental in the previous release and is now production status. A new Java application, called the SAS/ETS Model Editor (Experimental), provides a graphical user interface for editing nonlinear statistical models and provides a convenient way to use the MODEL procedure. New features have been added to the following SAS/ETS components: The AUTOREG procedure The COUNTREG procedure The MDC procedure The MODEL procedure The QLIM procedure SASEFAME interface engine SASEHAVR interface engine The TIMESERIES procedure The UCM procedure The X12 procedure New features for defining custom time intervals have been added to Base SAS software that might be of interest to SAS/ETS users. For more information, see SAS Language Reference: Dictionary.
Highlights of Enhancements in SAS/ETS 9.2 Users who are updating directly to SAS/ETS 9.22 from a release prior to SAS/ETS 9.2 can find information about the SAS/ETS 9.2 changes and enhancements in the chapter “What’s New in SAS/ETS” in the SAS/ETS 9.2 User’s Guide (see support.sas.com/whatsnewets92).
AUTOREG Procedure The following new features have been added to the AUTOREG procedure:
COUNTREG Procedure F 5
Three asymmetric GARCH models, namely quadratic GARCH, threshold GARCH, and power GARCH, are implemented to measure the impact of news on the future volatility. Power GARCH also considers the long memory property in the volatility. Besides the existing two tests for the existence of ARCH effect, Lee and King’s ARCH test and Wong and Li’s ARCH test are implemented. Lee and King’s ARCH test is a one-sided locally most mean powerful (LMMP) test; Wong and Li’s ARCH test is robust to outliers. If the NLAG= option is specified, the statistics based on the final model residuals, along with the OLS residuals, can also be computed. The Hannan-Quinn criterion (HQC) is implemented and included in the summary statistics. Four statistical tests of independence are implemented: BDS test, runs test, turning point test, and rank version of the von Neumann ratio test. They are powerful tools for model selection and specification test. The augmented Dickey-Fuller (ADF) test for unit root is implemented. This test accounts for some form of dependence between the innovations of the time series. The ADF formulation includes lags of the order p in the regression. When the lag is specified to be zero, it reduces to the standard Dickey-Fuller Unit root test. In the presence of regressors, the Engle-Granger cointegration test is performed using the augmented Dickey-Fuller test statistic. The Elliott-Rothenberg-Stock (ERS) unit root and Ng-Perron (NP) unit root test are implemented. These tests also perform automatic lag length selection by using the information criterion. The Bayesian information criterion (BIC) is used in the ERS test, and the modified Akaike information criterion (AICc) is used in Ng-Perron test. The CLASS statement is now supported. A CLASS statement enables you to declare classification variables for use as explanatory effects in a model. When a CLASS variable is used as a predictor in the MODEL statement, the procedure automatically creates a dummy regressor that corresponds to each discrete value or level of the CLASS variable. The MODEL statement now supports the use of CLASS variables and interaction terms as predictors. The AR, GARCH, and HETERO parameters can be specified in the TEST and RESTRICT statements. The likelihood ratio (LR) test and the Lagrange multiplier (LM) test are supported in TEST statement when GARCH= option is specified.
COUNTREG Procedure The following new features have been added to the COUNTREG procedure: The CLASS statement is now supported. A CLASS statement enables you to declare classification variables for use as explanatory effects in a model. When a CLASS variable is used as
6 F Chapter 1: What’s New in SAS/ETS 9.22
a predictor in the MODEL statement, the procedure automatically creates a dummy regressor that corresponds to each discrete value or level of the CLASS variable. The MODEL statement now supports the use of CLASS variables and interaction terms as predictors. The FREQ statement is now supported. A FREQ statement specifies a variable whose values indicate the number of cases that are represented by each observation. That is, the procedure treats each observation as if it had appeared n times in the input data set, where n is the value of the FREQ variable. The WEIGHT statement is now supported. A WEIGHT statement specifies a variable whose values supply weights for each observation in the dataset. These weights control the importance (weight) given to the data observations in fitting the model. The NLOPTIONS statement enables you to specify options for the subsystem that is used for the nonlinear optimization.
MDC Procedure The following new features have been added to the MDC procedure: The CLASS statement is now supported. A CLASS statement enables you to declare classification variables for use as explanatory effects in a model. When a CLASS variable is used as a predictor in the MODEL statement, the procedure automatically creates a dummy regressor that corresponds to each discrete value or level of the CLASS variable. The MODEL statement now supports the use of CLASS variables and interaction terms as predictors. The TEST statement is now supported to test linear equality restrictions on the parameters. Three tests are available: Wald, Lagrange multiplier, and likelihood ratio.
MODEL Procedure The following feature has been added to the MODEL procedure: For the GMM estimation method, Hansen’s J statistic for the test of overidentifying restrictions is reported along with its probabilty.
QLIM Procedure F 7
QLIM Procedure The following new features have been added to the QLIM procedure: The TE1 and TE2 options output technical efficiency measures for each producer in stochastic frontier models as suggested by Battese and Coelli (1988) and Jondrow at al. (1982). The WEIGHT statement is now supported. A WEIGHT statement identifies a variable to supply weights for each observation in the dataset. By default, the weights are normalized so that they add up to the sample size. If the NONORMALIZE option is used, the actual weights are used without normalization.
SASEFAME Engine The SASEFAME interface engine provides a seamless interface between Fame and SAS data to enable SAS users to access and process time series, case series, and formulas that reside in a Fame database. The following enhancements have been made to the SASEFAME access engine for Fame databases: The INSET= option enables you to pass Fame commands through an input SAS data set and select your Fame input variables by using the KEEPLIST= clause or the WHERE= clause as selection input for BY variables. The DBVERSION= option displays the version number of the Fame Work data base in the SAS log. SASEFAME uses Fame 10, which does not allow version 2 databases. Use the Fame compress utility with the -m option to convert your version 2 databases to version 3 or 4. The default is version 4. The TUNEFAME= option tunes the Fame database engine’s use of memory to reduce I/O times in favor of a bigger virtual memory for caching database objects. The default is 100 MB. The TUNECHLI= option tunes the C host language interface (CHLI) database engine’s use of memory to reduce I/O times in favor of a bigger virtual memory for caching database objects. The default is 100 MB. The WILDCARD= option enables you to select series by using the new Fame 10 wildcarding capabilities which allow a longer 242-character wildcard to match data object series names within the Fame database. The interface uses the most current version of Fame 10 CHLI. The SAS log reports the version number of the Fame 10 CHLI: NOTE: The SASEFAME engine is using Version 10.03 of the HLI.
8 F Chapter 1: What’s New in SAS/ETS 9.22
SASEHAVR Engine The SASEHAVR interface engine is a seamless interface between Haver and SAS data processing that enables SAS users to read economic and financial time series data that reside in a Haver Analytics DLX (Data Link Express) database. The following enhancements have been made to the SASEHAVR access engine for Haver Analytics databases: The AGGMODE= option enables you to specify a STRICT or RELAXED aggregation method. AGGMODE=RELAXED is the default setting. Aggregation is supported only from a more frequent time interval to a less frequent time interval, such as from weekly to monthly. The SAS log reports the status of AGGMODE. The SHORT= option enables you to specify the list of Haver short sources to be included in the output SAS data set. This list is comma-delimited and must be surrounded by quotation marks “”. The DROPSHORT= option enables you to specify the list of Haver short sources to be excluded from the output SAS data set. This list is comma-delimited and must be surrounded by quotation marks “”. The LONG= option enables you to specify the list of Haver long sources to be included in the output SAS data set. This list is comma-delimited and must be surrounded by quotation marks “”. The DROPLONG= option enables you to specify the list of Haver long sources to be excluded from the output SAS data set. This list is comma-delimited and must be surrounded by quotation marks “”. The GEOG1= option enables you to specify the list of Haver geography1 codes to be included in the output SAS data set. This list is comma-delimited and must be surrounded by quotation marks “”. The DROPGEOG1= option enables you to specify the list of Haver geography1 codes to be excluded from the output SAS data set. This list is comma-delimited and must be surrounded by quotation marks “”. The GEOG2= option enables you to specify the list of Haver geography2 codes to be included in the output SAS data set. This list is comma-delimited and must be surrounded by quotation marks “”. The DROPGEOG2= option enables you to specify the list of Haver geography2 codes to be excluded from the output SAS data set. This list is comma-delimited and must be surrounded by quotation marks “”. The OUTSELECT=ON option specifies that the output data set show values of selection keys such as geography codes, groups, sources, and short and long sources for each selected variable name (time series) in the database. The SAS log reports the status of OUTSELECT.
New SEVERITY Procedure (Experimental) F 9
The OUTSELECT=OFF option specifies that the output data set show the observations in range for all selected time series. This is the default for this option. The interface is now using the most current version of DLXAPI32. The SAS log reports the version number of the Haver DLX api.
New SEVERITY Procedure (Experimental) The new SEVERITY procedure fits models for statistical distributions of the severity (magnitude) of events. A couple of examples of the events typically modeled using the procedure are insurance loss payments and intermittent sales of products. The SEVERITY procedure is experimental for this release. It provides the following features: The magnitude of events can be modeled as a random variable with a continuous parametric probability distribution. The SEVERITY procedure uses the maximum likelihood method to fit multiple specified distributions and identifies the best model based on a specified model selection criterion. The SEVERITY procedure is delivered with a set of predefined models for several commonly used distributions. These include the Burr, exponential, gamma, inverse Gaussian, lognormal, Pareto, generalized Pareto, and Weibull distributions. The SEVERITY procedure is can be extended to fit any continuous parametric distribution. You can specify the distribution’s model by using a set of functions and subroutines that are defined by using the FCMP procedure. The model must include functions to provide the values of the probability density function (PDF) and the cumulative distribution function (CDF) of the distribution. The model can also optionally include functions or subroutines that provide the distribution’s description, the number of parameters, initial values and bounds for the parameters, the scale parameter transform, and the gradient vector and the Hessian matrix of the PDF and the CDF with respect to the parameters. Exogenous variables can be specified for fitting a model that has a scale parameter. The exogenous variables are modeled such that their linear combination affects the scale parameter via a specified link function. The regression coefficients that are associated with the variables in the linear combination are estimated along with the parameters of the distribution. Currently, only the exponential link function is supported. Censoring and truncation can be specified for each observed value of the response variable. Global values can also be specified to override the individual values that are associated with each observed value. Currently, only censoring from above (that is, right-censoring) and truncation from below (that is, left-truncation) are allowed.
10 F Chapter 1: What’s New in SAS/ETS 9.22
SIMILARITY Procedure The SIMILARITY procedure was classified as experimental in SAS/ETS 9.2. PROC SIMILARITY is now production status.
New TIMEID Procedure (Experimental) The new TIMEID procedure analyzes the sequence of ID values in a SAS data set to identify the time interval between observations and verifies that the observations in the data set represent a properly spaced time series. The TIMEID procedure provides the following features: Specified time intervals and alignments can be used to evaluate a data set’s time ID values in terms of the distributions of duplicated values, alignment offsets, and the gaps between adjacent observations. The time interval’s width, shift, and alignment can be inferred from a time ID variable. When either the interval or its alignment is specified, this information is used to guide the process of inferring the remaining quantity. When multiple BY groups are present, detailed diagnostics for each BY group are reported in addition to summarized diagnostic information which applies to all BY groups in the data set.
TIMESERIES Procedure Three features have been added to the TIMESERIES procedure for performing spectral analyses of the input time series and native database accumulation of data for a time series.
Singular Spectrum Analysis Singular spectrum analysis (SSA) is a technique for decomposing a time series into additive components and categorizing these components based on the magnitudes of their contributions. SSA uses a single parameter, the window length, to quantify patterns in a time series without relying on preconceived notions about the structure of the time series. The window length represents the maximum lag considered in the analysis and corresponds to the dimensionality of the PCA (principle components analysis) on which the SSA is based.
UCM Procedure F 11
In addition to SSA output options, an SSA statement has been added to explicitly control the window length parameter and the grouping of SSA series components.
Fourier Spectrum Analysis Functionality similar to that available in PROC SPECTRA for analyzing periodograms of time series data has been incorporated into PROC TIMESERIES. Now ODS graphical representations of periodograms and spectral density estimates can be computed and displayed.
Database Accumulation For Teradata-based input data sets, aggregation and accumulation can be performed using native facilities in the database server. Most ACCUMULATE= options specified in the ID and VAR statements can be performed by the database server.
UCM Procedure The ARMA model specification options in the IRREGULAR statement, which were experimental in SAS 9.2, are now production.
X12 Procedure Many new features have been added to the X12 procedure. The CHECK statement produces statistics for diagnostic checking of residuals from the estimated regARIMA model. The following new tables are associated with the CHECK statement: “Autocorrelation of regARIMA Model Residuals,” “Partial Autocorrelation of regARIMA Model Residuals,” “Autocorrelation of Squared regARIMA Model Residuals,” “Summary Statistics for the Unstandardized Residuals,” “Normality Statistics for regARIMA Model Residuals,” and “Table G Rs: 10*LOG(SPECTRUM) of the regARIMA Model Residuals.” If ODS GRAPHICS ON is specified, the following new plots are associated with diagnostic checking output: the autocorrelation function (ErrorACF) plot of the residuals, the partial autocorrelation function (ErrorPACF) plot of the residuals, the autocorrelation function (SqErrorACF) plot of the squared residuals, a histogram (ResidualHistogram) of the residuals, and a spectral plot (SpectralPlot) of the residuals. The MAXLAG option of the IDENTIFY statement specifies the maximum number of lags for the sample ACF and PACF that are associated with model identification.
12 F Chapter 1: What’s New in SAS/ETS 9.22
The following tables are now available through the OUTPUT statement: E1, E2, E3, and E8. The SIGMALIM option of the X11 statement enables you to specify the upper and lower sigma limits that are used to identify and decrease the weight of extreme irregular values in the internal seasonal adjustment computations. The TYPE option of the X11 statement controls which factors are removed from the original series to produce the seasonally adjusted series (table D11) and also the final trend cycle (table D12). The OUTSTAT= option of the X12 statement specifies the optional output data set that contains the summary statistics related to each seasonally adjusted series. The data set is sorted by the BY-group variables, if any, and by series names. The PERIODOGRAM option of the X12 statement enables you to specify that the PERIODOGRAM rather than the SPECTRUM of the series be plotted in the G tables and plots. The PLOTS= option of the X12 statement controls the plots that are produced through ODS Graphics. The SPECTRUMSERIES option of the X12 statement specifies the table name of the series that is used in the spectrum of the original series (table G0). The table names that can be specified are A1, A19, B1, or E1. The default is B1. The following tables are now available through the TABLES statement: E1, E2, and E3. The following tables are now available through ODS: “Model Description for ARIMA Model Identification”, “Model Description for ARIMA Model Estimation”, “Final Seasonal Filter Selection via Global MSR”, “Seasonal Filters by Period”, and “Final Trend Cycle Statistics”. The model description information was previously displayed in notes; an ODS table enables you to export the information to a data set. The seasonal filter and trend filter tables are new. Auxiliary variables have been added to ACF and PACF data sets that are available through ODS OUTPUT. The following variables have been added: _NAME_, Transform, Adjust, Regressors, Diff, and Sdiff. The purpose of the new variables is to help you identify the source of the data when multiple ACFs and PACFs are calculated. The following new feature is experimental: The AUXDATA= option of the X12 specifies an auxiliary input data set that can contain user-defined variables specified in the INPUT statement, the USERVAR= option of the REGRESSION statment, or the USERDEFINED statement. The AUXDATA= option is useful when user-defined regressors are used for multiple time series data sets or multiple BY groups.
SAS/ETS Model Editor Application (Experimental) A new interactive application, the SAS/ETS Model Editor, enables you to define, fit, and simulate nonlinear statistical models using the MODEL procedure. The SAS/ETS Model Editor enables you
Date Intervals, Formats, and Functions F 13
to use the powerful features of PROC MODEL through a convenient and interactive graphical user interface.
Date Intervals, Formats, and Functions The custom time intervals that are available in Base SAS software can be used in SAS/ETS procedures. Custom time intervals enable you to specify beginning and ending dates and seasonality for time intervals according to any definition. Such intervals can be used to define the following: fiscal intervals such as monthly intervals that begin on a day other than the first day of the month (for example, intervals that begin on the 10th day of each month) fiscal intervals such as monthly intervals that begin on different days for different months (for example, March of 2000 can begin on March 10, but April of 2000 can begin on April 12) business days, such as banking days that exclude holidays hourly intervals that omit hours that the business is closed
14
Chapter 2
Introduction Contents Overview of SAS/ETS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . Uses of SAS/ETS Software . . . . . . . . . . . . . . . . . . . . . . . . . . Contents of SAS/ETS Software . . . . . . . . . . . . . . . . . . . . . . . . About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Typographical Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . Where to Turn for More Information . . . . . . . . . . . . . . . . . . . . . . . . . Accessing the SAS/ETS Sample Library . . . . . . . . . . . . . . . . . . . Online Help System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SAS Short Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SAS Technical Support Services . . . . . . . . . . . . . . . . . . . . . . . . Major Features of SAS/ETS Software . . . . . . . . . . . . . . . . . . . . . . . . Discrete Choice and Qualitative and Limited Dependent Variable Analysis . Regression with Autocorrelated and Heteroscedastic Errors . . . . . . . . . Simultaneous Systems Linear Regression . . . . . . . . . . . . . . . . . . . Linear Systems Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . Polynomial Distributed Lag Regression . . . . . . . . . . . . . . . . . . . . Nonlinear Systems Regression and Simulation . . . . . . . . . . . . . . . . ARIMA (Box-Jenkins) and ARIMAX (Box-Tiao) Modeling and Forecasting Vector Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . State Space Modeling and Forecasting . . . . . . . . . . . . . . . . . . . . Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seasonal Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structural Time Series Modeling and Forecasting . . . . . . . . . . . . . . . Time Series Cross-Sectional Regression Analysis . . . . . . . . . . . . . . . Automatic Time Series Forecasting . . . . . . . . . . . . . . . . . . . . . . Time Series Interpolation and Frequency Conversion . . . . . . . . . . . . . Trend and Seasonal Analysis on Transaction Databases . . . . . . . . . . . Access to Financial and Economic Databases . . . . . . . . . . . . . . . . . Spreadsheet Calculations and Financial Report Generation . . . . . . . . . . Loan Analysis, Comparison, and Amortization . . . . . . . . . . . . . . . . Time Series Forecasting System . . . . . . . . . . . . . . . . . . . . . . . . Investment Analysis System . . . . . . . . . . . . . . . . . . . . . . . . . . ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
16 17 18 20 20 21 22 22 22 23 23 23 23 25 26 28 28 29 31 32 34 34 35 36 37 38 39 41 42 44 45 46 47 48
16 F Chapter 2: Introduction
Related SAS Software . . . . . . . . . . . . . . . Base SAS Software . . . . . . . . . . . . . SAS Forecast Studio . . . . . . . . . . . . . SAS High-Performance Forecasting . . . . . SAS/GRAPH Software . . . . . . . . . . . SAS/STAT Software . . . . . . . . . . . . . SAS/IML Software . . . . . . . . . . . . . . SAS/IML Stat Studio . . . . . . . . . . . . SAS/OR Software . . . . . . . . . . . . . . SAS/QC Software . . . . . . . . . . . . . . MLE for User-Defined Likelihood Functions JMP Software . . . . . . . . . . . . . . . . SAS Enterprise Guide . . . . . . . . . . . . SAS Add-In for Microsoft Office . . . . . . Enterprise Miner—Time Series nodes . . . . SAS Risk Products . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
48 49 51 52 52 53 54 55 55 56 56 57 58 59 59 60 61
Overview of SAS/ETS Software SAS/ETS software, a component of the SAS System, provides SAS procedures for: econometric analysis time series analysis time series forecasting systems modeling and simulation discrete choice analysis analysis of qualitative and limited dependent variable models seasonal adjustment of time series data financial analysis and reporting access to economic and financial databases time series data management In addition to SAS procedures, SAS/ETS software also includes seamless access to economic and financial databases and interactive environments for time series forecasting and investment analysis.
Uses of SAS/ETS Software F 17
Uses of SAS/ETS Software SAS/ETS software provides tools for a wide variety of applications in business, government, and academia. Major uses of SAS/ETS procedures are economic analysis, forecasting, economic and financial modeling, time series analysis, financial reporting, and manipulation of time series data. The common theme relating the many applications of the software is time series data: SAS/ETS software is useful whenever it is necessary to analyze or predict processes that take place over time or to analyze models that involve simultaneous relationships. Although SAS/ETS software is most closely associated with business, finance and economics, time series data also arise in many other fields. SAS/ETS software is useful whenever time dependencies, simultaneous relationships, or dynamic processes complicate data analysis. For example, an environmental quality study might use SAS/ETS software’s time series analysis tools to analyze pollution emissions data. A pharmacokinetic study might use SAS/ETS software’s features for nonlinear systems to model the dynamics of drug metabolism in different tissues. The diversity of problems for which econometrics and time series analysis tools are needed is reflected in the applications reported by SAS users. The following listed items are some applications of SAS/ETS software presented by SAS users at past annual conferences of the SAS Users Group International (SUGI). forecasting college enrollment (Calise and Earley 1997) fitting a pharmacokinetic model (Morelock et al. 1995) testing interaction effect in reducing sudden infant death syndrome (Fleming, Gibson, and Fleming 1996) forecasting operational indices to measure productivity changes (McCarty 1994) spectral decomposition and reconstruction of nuclear plant signals (Hoyer and Gross 1993) estimating parameters for the constant-elasticity-of-substitution translog model (Hisnanick 1993) applying econometric analysis for mass appraisal of real property (Amal and Weselowski 1993) forecasting telephone usage data (Fishetti, Heathcote, and Perry 1993) forecasting demand and utilization of inpatient hospital services (Hisnanick 1992) using conditional demand estimation to determine electricity demand (Keshani and Taylor 1992) estimating tree biomass for measurement of forestry yields (Parresol and Thomas 1991) evaluating the theory of input separability in the production function of U.S. manufacturing (Hisnanick 1991)
18 F Chapter 2: Introduction
forecasting dairy milk yields and composition (Benseman 1990) predicting the gloss of coated aluminum products subject to weathering (Khan 1990) learning curve analysis for predicting manufacturing costs of aircraft (Le Bouton 1989) analyzing Dow Jones stock index trends (Early, Sweeney, and Zekavat 1989) analyzing the usefulness of the composite index of leading economic indicators for forecasting the economy (Lin and Myers 1988)
Contents of SAS/ETS Software Procedures SAS/ETS software includes the following SAS procedures: ARIMA
ARIMA (Box-Jenkins) and ARIMAX (Box-Tiao) modeling and forecasting
AUTOREG
regression analysis with autocorrelated or heteroscedastic errors and ARCH and GARCH modeling
COMPUTAB
spreadsheet calculations and financial report generation
COUNTREG
regression modeling for dependent variables that represent counts
DATASOURCE
access to financial and economic databases
ENTROPY
maximum entropy-based regression
ESM
forecasting by using exponential smoothing models with optimized smoothing weights
EXPAND
time series interpolation, frequency conversion, and transformation of time series
FORECAST
automatic forecasting
LOAN
loan analysis and comparison
MDC
multinomial discrete choice analysis
MODEL
nonlinear simultaneous equations regression and nonlinear systems modeling and simulation
PANEL
panel data models
PDLREG
polynomial distributed lag regression
QLIM
qualitative and limited dependent variable analysis
SIMILARITY
similarity analysis of time series data for time series data mining
SIMLIN
linear systems simulation
SPECTRA
spectral and cross-spectral analysis
STATESPACE
state space modeling and automated forecasting of multivariate time series
SYSLIN
linear simultaneous equations models
Contents of SAS/ETS Software F 19
TIMESERIES
analysis of time-stamped transactional data
TSCSREG
time series cross-sectional regression analysis
UCM
unobserved components analysis of time series
VARMAX
vector autoregressive and moving-average modeling and forecasting
X11
seasonal adjustment (Census X-11 and X-11 ARIMA)
X12
seasonal adjustment (Census X-12 ARIMA)
Macros SAS/ETS software includes the following SAS macros: %AR
generates statements to define autoregressive error models for the MODEL procedure
%BOXCOXAR
investigates Box-Cox transformations useful for modeling and forecasting a time series
%DFPVALUE
computes probabilities for Dickey-Fuller test statistics
%DFTEST
performs Dickey-Fuller tests for unit roots in a time series process
%LOGTEST
tests to determine whether a log transformation is appropriate for modeling and forecasting a time series
%MA
generates statements to define moving-average error models for the MODEL procedure
%PDL
generates statements to define polynomial distributed lag models for the MODEL procedure
These macros are part of the SAS AUTOCALL facility and are automatically available for use in your SAS program. Refer to SAS Macro Language: Reference for information about the SAS macro facility.
Access Interfaces to Economic and Financial Databases In addition to PROC DATASOURCE, these SAS/ETS access interfaces provide seamless access to financial and economic databases: SASECRSP
LIBNAME engine for accessing time series and event data residing in CRSPAccess database.
SASEFAME
LIBNAME engine for accessing time or case series data residing in a FAME database.
SASEHAVR
LIBNAME engine for accessing time series residing in a HAVER ANALYTICS Data Link Express (DLX) database.
20 F Chapter 2: Introduction
The Time Series Forecasting System SAS/ETS software includes an interactive forecasting system, described in Part IV. This graphical user interface to SAS/ETS forecasting features was developed with SAS/AF software and uses PROC ARIMA and other internal routines to perform time series forecasting. The Time Series Forecasting System makes it easy to forecast time series and provides many features for graphical data exploration and graphical comparisons of forecasting models and forecasts. (You must have SAS/GRAPH® installed to use the graphical features of the system.)
The Investment Analysis System The Investment Analysis System, described in Part V, is an interactive environment for analyzing the time-value of money in a variety of investments. Various analyses are provided to help analyze the value of investment alternatives: time value, periodic equivalent, internal rate of return, benefit-cost ratio, and break-even analysis.
About This Book This book is a user’s guide to SAS/ETS software. Since SAS/ETS software is a part of the SAS System, this book assumes that you are familiar with Base SAS software and have the books SAS Language Reference: Dictionary and Base SAS Procedures Guide available for reference. It also assumes that you are familiar with SAS data sets, the SAS DATA step, and with basic SAS procedures such as PROC PRINT and PROC SORT. Chapter 3, “Working with Time Series Data,” in this book summarizes the aspects of Base SAS software that are most relevant to the use of SAS/ETS software.
Chapter Organization Following a brief What’s New, this book is divided into five major parts. Part I contains general information to aid you in working with SAS/ETS Software. Part II explains the SAS procedures of SAS/ETS software. Part III describes the available data access interfaces for economic and financial databases. Part IV is the reference for the Time Series Forecasting System, an interactive forecasting menu system that uses PROC ARIMA and other routines to perform time series forecasting. Finally, Part V is the reference for the Investment Analysis System. The new features added to SAS/ETS software since the publication of SAS/ETS Software: Changes and Enhancements for Release 8.2 are summarized in Chapter 1, “What’s New in SAS/ETS 9.22.” If you have used SAS/ETS software in the past, you may want to skim this chapter to see what’s new. Part I contains the following chapters. Chapter 2, the current chapter, provides an overview of SAS/ETS software and summarizes related SAS publications, products, and services.
Typographical Conventions F 21
Chapter 3, “Working with Time Series Data,” discusses the use of SAS data management and programming features for time series data. Chapter 4, “Date Intervals, Formats, and Functions,” summarizes the time intervals, date and datetime informats, date and datetime formats, and date and datetime functions available in the SAS System. Chapter 5, “SAS Macros and Functions,” documents SAS macros and DATA step financial functions provided with SAS/ETS software. The macros use SAS/ETS procedures to perform Dickey-Fuller tests, test for the need for log transformations, or select optimal Box-Cox transformation parameters for time series data. Chapter 6, “Nonlinear Optimization Methods,” documents the NonLinear Optimization subsystem used by some ETS procedures to perform nonlinear optimization tasks. Part II contains chapters that explain the SAS procedures that make up SAS/ETS software. These chapters appear in alphabetical order by procedure name. Part III contains chapters that document the ETS access interfaces to economic and financial databases. Each of the chapters that document the SAS/ETS procedures (Part II) and the SAS/ETS access interfaces (Part III) is organized as follows: 1. The “Overview” section gives a brief description of the procedure. 2. The “Getting Started” section provides a tutorial introduction on how to use the procedure. 3. The “Syntax” section is a reference to the SAS statements and options that control the procedure. 4. The “Details” section discusses various technical details. 5. The “Examples” section contains examples of the use of the procedure. 6. The “References” section contains technical references on methodology. Part IV contains the chapters that document the features of the Time Series Forecasting System. Part V contains chapters that document the features of the Investment Analysis System.
Typographical Conventions This book uses several type styles for presenting information. The following list explains the meaning of the typographical conventions used in this book: roman
is the standard type style used for most text.
UPPERCASE ROMAN
is used for SAS statements, options, and other SAS language elements when they appear in the text. However, you can enter these elements in
22 F Chapter 2: Introduction
your own SAS programs in lowercase, uppercase, or a mixture of the two. UPPERCASE BOLD
is used in the “Syntax” sections’ initial lists of SAS statements and options.
oblique
is used for user-supplied values for options in the syntax definitions. In the text, these values are written in italic.
helvetica
is used for the names of variables and data sets when they appear in the text.
bold
is used to refer to matrices and vectors and to refer to commands.
italic
is used for terms that are defined in the text, for emphasis, and for references to publications.
bold monospace
is used for example code. In most cases, this book uses lowercase type for SAS statements.
Where to Turn for More Information This section describes other sources of information about SAS/ETS software.
Accessing the SAS/ETS Sample Library The SAS/ETS Sample Library includes many examples that illustrate the use of SAS/ETS software, including the examples used in this documentation. To access these sample programs, select Help from the menu and then select SAS Help and Documentation. From the Contents list, select the section Sample SAS Programs under Learning to Use SAS.
Online Help System You can access online help information about SAS/ETS software in two ways, depending on whether you are using the SAS windowing environment in the command line mode or the pull-down menu mode. If you are using a command line, you can access the SAS/ETS help menus by typing help on the SAS windowing environment command line. Or you can issue the command help ARIMA (or another procedure name) to display the help for that particular procedure. If you are using the SAS windowing environment pull-down menus, you can pull-down the Help menu and make the following selections:
SAS Short Courses F 23
SAS Help and Documentation Learning to Use SAS in the Contents list SAS Products SAS/ETS The content of the Online Help System follows closely that of this book.
SAS Short Courses The SAS Education Division offers a number of training courses that might be of interest to SAS/ETS users. Please check the SAS web site for the current list of available training courses.
SAS Technical Support Services As with all SAS products, the SAS Technical Support staff is available to respond to problems and answer technical questions regarding the use of SAS/ETS software.
Major Features of SAS/ETS Software The following sections briefly summarize major features of SAS/ETS software. See the chapters on individual procedures for more detailed information.
Discrete Choice and Qualitative and Limited Dependent Variable Analysis The MDC procedure provides maximum likelihood (ML) or simulated maximum likelihood estimates of multinomial discrete choice models in which the choice set consists of unordered multiple alternatives. The MDC procedure supports the following models and features: conditional logit nested logit
24 F Chapter 2: Introduction
heteroscedastic extreme value multinomial probit mixed logit pseudo-random or quasi-random numbers for simulated maximum likelihood estimation bounds imposed on the parameter estimates linear restrictions imposed on the parameter estimates SAS data set containing predicted probabilities and linear predictor (x0 ˇ) values decision tree and nested logit model fit and goodness-of-fit measures including – likelihood ratio – Aldrich-Nelson – Cragg-Uhler 1 – Cragg-Uhler 2 – Estrella – Adjusted Estrella – McFadden’s LRI – Veall-Zimmermann – Akaike Information Criterion (AIC) – Schwarz Criterion or Bayesian Information Criterion (BIC) The QLIM procedure analyzes univariate and multivariate limited dependent variable models where dependent variables take discrete values or dependent variables are observed only in a limited range of values. This procedure includes logit, probit, Tobit, and general simultaneous equations models. The QLIM procedure supports the following models: linear regression model with heteroscedasticity probit with heteroscedasticity logit with heteroscedasticity Tobit (censored and truncated) with heteroscedasticity Box-Cox regression with heteroscedasticity bivariate probit bivariate Tobit sample selection models
Regression with Autocorrelated and Heteroscedastic Errors F 25
multivariate limited dependent models The COUNTREG procedure provides regression models in which the dependent variable takes nonnegative integer count values. The COUNTREG procedure supports the following models: Poisson regression negative binomial regression with quadratic and linear variance functions zero inflated Poisson (ZIP) model zero inflated negative binomial (ZINB) model fixed and random effect Poisson panel data models fixed and random effect NB (negative binomial) panel data models The PANEL procedure deals with panel data sets that consist of time series observations on each of several cross-sectional units. The models and methods the PANEL procedure uses to analyze are as follows: one-way and two-way models fixed and random effects autoregressive models – the Parks method – dynamic panel estimator – the Da Silva method for moving-average disturbances
Regression with Autocorrelated and Heteroscedastic Errors The AUTOREG procedure provides regression analysis and forecasting of linear models with autocorrelated or heteroscedastic errors. The AUTOREG procedure includes the following features: estimation and prediction of linear regression models with autoregressive errors any order autoregressive or subset autoregressive process optional stepwise selection of autoregressive parameters choice of the following estimation methods: – exact maximum likelihood – exact nonlinear least squares
26 F Chapter 2: Introduction
– Yule-Walker – iterated Yule-Walker tests for any linear hypothesis that involves the structural coefficients restrictions for any linear combination of the structural coefficients forecasts with confidence limits estimation and forecasting of ARCH (autoregressive conditional heteroscedasticity), GARCH (generalized autoregressive conditional heteroscedasticity), I-GARCH (integrated GARCH), E-GARCH (exponential GARCH), and GARCH-M (GARCH in mean) models combination of ARCH and GARCH models with autoregressive models, with or without regressors estimation and testing of general heteroscedasticity models variety of model diagnostic information including the following: – autocorrelation plots – partial autocorrelation plots – Durbin-Watson test statistic and generalized Durbin-Watson tests to any order – Durbin h and Durbin t statistics – Akaike information criterion – Schwarz information criterion – tests for ARCH errors – Ramsey’s RESET test – Chow and PChow tests – Phillips-Perron stationarity test – CUSUM and CUMSUMSQ statistics exact significance levels (p-values) for the Durbin-Watson statistic embedded missing values
Simultaneous Systems Linear Regression The SYSLIN and ENTROPY procedures provide regression analysis of a simultaneous system of linear equations. The SYSLIN procedure includes the following features: estimation of parameters in simultaneous systems of linear equations full range of estimation methods including the following:
Simultaneous Systems Linear Regression F 27
– ordinary least squares (OLS) – two-stage least squares (2SLS) – three-stage least squares (3SLS) – iterated 3SLS (IT3SLS) – seemingly unrelated regression (SUR) – iterated SUR (ITSUR) – limited-information maximum likelihood (LIML) – full-information maximum likelihood (FIML) – minimum expected loss (MELO) – general K-class estimators weighted regression any number of restrictions for any linear combination of coefficients, within a single model or across equations tests for any linear hypothesis, for the parameters of a single model or across equations wide range of model diagnostics and statistics including the following: – usual ANOVA tables and R-square statistics – Durbin-Watson statistics – standardized coefficients – test for overidentifying restrictions – residual plots – standard errors and t tests – covariance and correlation matrices of parameter estimates and equation errors predicted values, residuals, parameter estimates, and variance-covariance matrices saved in output SAS data sets other features of the SYSLIN procedure that enable you to do the following: – impose linear restrictions on the parameter estimates – test linear hypotheses about the parameters – write predicted and residual values to an output SAS data set – write parameter estimates to an output SAS data set – write the crossproducts matrix (SSCP) to an output SAS data set – use raw data, correlations, covariances, or cross products as input The ENTROPY procedure supports the following models and features: generalized maximum entropy (GME) estimation
28 F Chapter 2: Introduction
generalized cross entropy (GCE) estimation normed moment generalized maximum entropy maximum entropy-based seemingly unrelated regression (MESUR) estimation pure inverse estimation estimation of parameters in simultaneous systems of linear equations Markov models unordered multinomial choice problems weighted regression any number of restrictions for any linear combination of coefficients, within a single model or across equations tests for any linear hypothesis, for the parameters of a single model or across equations
Linear Systems Simulation The SIMLIN procedure performs simulation and multiplier analysis for simultaneous systems of linear regression models. The SIMLIN procedure includes the following features: reduced form coefficients interim multipliers total multipliers dynamic multipliers multipliers for higher order lags dynamic forecasts and simulations goodness-of-fit statistics acceptance of the equation system coefficients estimated by the SYSLIN procedure as input
Polynomial Distributed Lag Regression The PDLREG procedure provides regression analysis for linear models with polynomial distributed (Almon) lags. The PDLREG procedure includes the following features:
Nonlinear Systems Regression and Simulation F 29
entry of any number of regressors as a polynomial lag distribution and the use of any number of covariates use of any order lag length and degree polynomial for lag distribution optional upper and lower endpoint restrictions specification of any number of linear restrictions on covariates option to repeat analysis over a range of degrees for the lag distribution polynomials support for autoregressive errors to any lag forecasts with confidence limits
Nonlinear Systems Regression and Simulation The MODEL procedure provides parameter estimation, simulation, and forecasting of dynamic nonlinear simultaneous equation models. The MODEL procedure includes the following features: nonlinear regression analysis for systems of simultaneous equations, including weighted nonlinear regression full range of parameter estimation methods including the following: – nonlinear ordinary least squares (OLS) – nonlinear seemingly unrelated regression (SUR) – nonlinear two-stage least squares (2SLS) – nonlinear three-stage least squares (3SLS) – iterated SUR – iterated 3SLS – generalized method of moments (GMM) – nonlinear full-information maximum likelihood (FIML) – simulated method of moments (SMM) supports dynamic multi-equation nonlinear models of any size or complexity uses the full power of the SAS programming language for model definition, including lefthand-side expressions hypothesis tests of nonlinear functions of the parameter estimates linear and nonlinear restrictions of the parameter estimates bounds imposed on the parameter estimates computation of estimates and standard errors of nonlinear functions of the parameter estimates
30 F Chapter 2: Introduction
estimation and simulation of ordinary differential equations (ODE’s) vector autoregressive error processes and polynomial lag distributions easily specified for the nonlinear equations variance modeling (ARCH, GARCH, and others) computation of goal-seeking solutions of nonlinear systems to find input values needed to produce target outputs dynamic, static, or n-period-ahead-forecast simulation modes simultaneous solution or single equation solution modes Monte Carlo simulation using parameter estimate covariance and across-equation residuals covariance matrices or user-specified random functions a variety of diagnostic statistics including the following – model R-square statistics – general Durbin-Watson statistics and exact p-values – asymptotic standard errors and t tests – first-stage R-square statistics – covariance estimates – collinearity diagnostics – simulation goodness-of-fit statistics – Theil inequality coefficient decompositions – Theil relative change forecast error measures – heteroscedasticity tests – Godfrey test for serial correlation – Hausman specification test – Chow tests block structure and dependency structure analysis for the nonlinear system listing and cross-reference of fitted model automatic calculation of needed derivatives by using exact analytic formula efficient sparse matrix methods used for model solution; choice of other solution methods Model definition, parameter estimation, simulation, and forecasting can be performed interactively in a single SAS session or models can also be stored in files and reused and combined in later runs.
ARIMA (Box-Jenkins) and ARIMAX (Box-Tiao) Modeling and Forecasting F 31
ARIMA (Box-Jenkins) and ARIMAX (Box-Tiao) Modeling and Forecasting The ARIMA procedure provides the identification, parameter estimation, and forecasting of autoregressive integrated moving-average (Box-Jenkins) models, seasonal ARIMA models, transfer function models, and intervention models. The ARIMA procedure includes the following features: complete ARIMA (Box-Jenkins) modeling with no limits on the order of autoregressive or moving-average processes model identification diagnostics including the following: – autocorrelation function – partial autocorrelation function – inverse autocorrelation function – cross-correlation function – extended sample autocorrelation function – minimum information criterion for model identification – squared canonical correlations stationarity tests outlier detection intervention analysis regression with ARMA errors transfer function modeling with fully general rational transfer functions seasonal ARIMA models ARIMA model-based interpolation of missing values several parameter estimation methods including the following: – exact maximum likelihood – conditional least squares – exact nonlinear unconditional least squares (ELS or ULS) prewhitening transformations forecasts and confidence limits for all models forecasting tied to parameter estimation methods: finite memory forecasts for models estimated by maximum likelihood or exact nonlinear least squares methods and infinite memory forecasts for models estimated by conditional least squares
32 F Chapter 2: Introduction
diagnostic statistics to help judge the adequacy of the model including the following: – Akaike’s information criterion (AIC) – Schwarz’s Bayesian criterion (SBC or BIC) – Box-Ljung chi-square test statistics for white-noise residuals – autocorrelation function of residuals – partial autocorrelation function of residuals – inverse autocorrelation function of residuals – automatic outlier detection
Vector Time Series Analysis The VARMAX procedure enables you to model the dynamic relationship both between the dependent variables and between the dependent and independent variables. The VARMAX procedure includes the following features: several modeling features: – vector autoregressive model – vector autoregressive model with exogenous variables – vector autoregressive and moving-average model – Bayesian vector autoregressive model – vector error correction model – Bayesian vector error correction model – GARCH-type multivariate conditional heteroscedasticity models criteria for automatically determining AR and MA orders: – Akaike information criterion (AIC) – corrected AIC (AICC) – Hannan-Quinn (HQ) criterion – final prediction error (FPE) – Schwarz Bayesian criterion (SBC), also known as Bayesian information criterion (BIC) AR order identification aids: – partial cross-correlations – Yule-Walker estimates – partial autoregressive coefficients – partial canonical correlations
Vector Time Series Analysis F 33
testing the presence of unit roots and cointegration: – Dickey-Fuller tests – Johansen cointegration test for nonstationary vector processes of integrated order one – Stock-Watson common trends test for the possibility of cointegration among nonstationary vector processes of integrated order one – Johansen cointegration test for nonstationary vector processes of integrated order two model parameter estimation methods: – least squares (LS) – maximum likelihood (ML) model checks and residual analysis using the following tests: – Durbin-Watson (DW) statistics – F test for autoregressive conditional heteroscedastic (ARCH) disturbance – F test for AR disturbance – Jarque-Bera normality test – Portmanteau test seasonal deterministic terms subset models multiple regression with distributed lags dead-start model that does not have present values of the exogenous variables Granger-causal relationships between two distinct groups of variables infinite order AR representation impulse response function (or infinite order MA representation) decomposition of the predicted error covariances roots of the characteristic functions for both the AR and MA parts to evaluate the proximity of the roots to the unit circle contemporaneous relationships among the components of the vector time series forecasts conditional covariances for GARCH models
34 F Chapter 2: Introduction
State Space Modeling and Forecasting The STATESPACE procedure provides automatic model selection, parameter estimation, and forecasting of state space models. (State space models encompass an alternative general formulation of multivariate ARIMA models.) The STATESPACE procedure includes the following features: multivariate ARIMA modeling by using the general state space representation of the stochastic process automatic model selection using Akaike’s information criterion (AIC) user-specified state space models including restrictions transfer function models with random inputs any combination of simple and seasonal differencing; input series can be differenced to any order for any lag lengths forecasts with confidence limits ability to save selected and fitted model in a data set and reuse for forecasting wide range of output options including the ability to print any statistics concerning the data and their covariance structure, the model selection process, and the final model fit
Spectral Analysis The SPECTRA procedure provides spectral analysis and cross-spectral analysis of time series. The SPECTRA procedure includes the following features: efficient calculation of periodogram and smoothed periodogram using fast finite Fourier transform and Chirp-Z algorithms multiple spectral analysis, including raw and smoothed spectral and cross-spectral function estimates, with user-specified window weights choice of kernel for smoothing output of the following spectral estimates to a SAS data set: – Fourier sine and cosine coefficients – periodogram – smoothed periodogram – cospectrum – quadrature spectrum
Seasonal Adjustment F 35
– amplitude – phase spectrum – squared coherency Fisher’s Kappa and Bartlett’s Kolmogorov-Smirnov test statistic for testing a null hypothesis of white noise
Seasonal Adjustment The X11 procedure provides seasonal adjustment of time series by using the Census X-11 or X-11 ARIMA method. The X11 procedure is based on the U.S. Bureau of the Census X-11 seasonal adjustment program and also supports the X-11 ARIMA method developed by Statistics Canada. The X11 procedure includes the following features: decomposition of monthly or quarterly series into seasonal, trend, trading day, and irregular components both multiplicative and additive form of the decomposition all the features of the Census Bureau program support of the X-11 ARIMA method support of sliding spans analysis processing of any number of variables at once with no maximum length for a series computation of tests for stable, moving, and combined seasonality optional printing or storing in SAS data sets of the individual X11 tables that show the various components at different stages of the computation; full control over what is printed or output ability to project seasonal component one year ahead, which enables reintroduction of seasonal factors for an extrapolated series The X12 procedure provides seasonal adjustment of time series using the X-12 ARIMA method. The X12 procedure is based on the U.S. Bureau of the Census X-12 ARIMA seasonal adjustment program (version 0.3). It also supports the X-11 ARIMA method developed by Statistics Canada and the previous X-11 method of the U.S. Census Bureau. The X12 procedure includes the following features: decomposition of monthly or quarterly series into seasonal, trend, trading day, and irregular components support of multiplicative, additive, pseudo-additive, and log additive forms of decomposition support of the X-12 ARIMA method
36 F Chapter 2: Introduction
support of regARIMA modeling automatic identification of outliers support of TRAMO-based automatic model selection use of regressors to process missing values within the span of the series processing of any number of variables at once with no maximum length for a series computation of tests for stable, moving, and combined seasonality spectral analysis of original, seasonally adjusted, and irregular series optional printing or storing in a SAS data set of the individual X11 tables that show the various components at different stages of the decomposition; full control over what is printed or output optional projection of seasonal component one year ahead, which enables reintroduction of seasonal factors for an extrapolated series
Structural Time Series Modeling and Forecasting The UCM procedure provides a flexible environment for analyzing time series data using structural time series models, also called unobserved components models (UCM). These models represent the observed series as a sum of suitably chosen components such as trend, seasonal, cyclical, and regression effects. You can use the UCM procedure to formulate comprehensive models that bring out all the salient features of the series under consideration. Structural models are applicable in the same situations where Box-Jenkins ARIMA models are applicable; however, the structural models tend to be more informative about the underlying stochastic structure of the series. The UCM procedure includes the following features: general unobserved components modeling where the models can include trend, multiple seasons and cycles, and regression effects maximum-likelihood estimation of the model parameters model diagnostics that include a variety of goodness-of-fit statistics, and extensive graphical diagnosis of the model residuals forecasts and confidence limits for the series and all the model components Model-based seasonal decomposition extensive plotting capability that includes the following: – forecast and confidence interval plots for the series and model components such as trend, cycles, and seasons – diagnostic plots such as residual plot, residual autocorrelation plots, and so on
Time Series Cross-Sectional Regression Analysis F 37
– seasonal decomposition plots such as trend, trend plus cycles, trend plus cycles plus seasons, and so on model-based interpolation of series missing values full sample (also called smoothed) estimates of the model components
Time Series Cross-Sectional Regression Analysis The TSCSREG procedure provides combined time series cross-sectional regression analysis. The TSCSREG procedure includes the following features: estimation of the regression parameters under several common error structures: – Fuller and Battese method (variance component model) – Wansbeek-Kapteyn method – Parks method (autoregressive model) – Da Silva method (mixed variance component moving-average model) – one-way fixed effects – two-way fixed effects – one-way random effects – two-way random effects any number of model specifications unbalanced panel data for the fixed or random-effects models variety of estimates and statistics including the following: – underlying error components estimates – regression parameter estimates – standard errors of estimates – t-tests – R-square statistic – correlation matrix of estimates – covariance matrix of estimates – autoregressive parameter estimate – cross-sectional components estimates – autocovariance estimates – F tests of linear hypotheses about the regression parameters – specification tests
38 F Chapter 2: Introduction
Automatic Time Series Forecasting The ESM procedure provides a quick way to generate forecasts for many time series or transactional data in one step by using exponential smoothing methods. All parameters associated with the forecasting model are optimized based on the data. You can use the following smoothing models: simple double linear damped trend seasonal Winters method (additive and multiplicative) Additionally, PROC ESM can transform the data before applying the smoothing methods using any of these transformations: log square root logistic Box-Cox In addition to forecasting, the ESM procedure can also produce graphic output. The ESM procedure can forecast both time series data, whose observations are equally spaced at a specific time interval (for example, monthly, weekly), or transactional data, whose observations are not spaced with respect to any particular time interval. (Internet, inventory, sales, and similar data are typical examples of transactional data. For transactional data, the data are accumulated based on a specified time interval to form a time series.) The ESM procedure is a replacement for the older FORECAST procedure. ESM is often more convenient to use than PROC FORECAST but it supports only exponential smoothing models. The FORECAST procedure provides forecasting of univariate time series using automatic trend extrapolation. PROC FORECAST is an easy-to-use procedure for automatic forecasting and uses simple popular methods that do not require statistical modeling of the time series, such as exponential smoothing, time trend with autoregressive errors, and the Holt-Winters method. The FORECAST procedure supplements the powerful forecasting capabilities of the econometric and time series analysis procedures described previously. You can use PROC FORECAST when you
Time Series Interpolation and Frequency Conversion F 39
have many series to forecast and you want to extrapolate trends without developing a model for each series. The FORECAST procedure includes the following features: choice of the following forecasting methods: – EXPO method—exponential smoothing: single, double, triple, or Holt two-parameter smoothing – exponential smoothing as an ARIMA Model – WINTERS method—using updating equations similar to exponential smoothing to fit model parameters – ADDWINTERS method—like the WINTERS method except that the seasonal parameters are added to the trend instead of multiplied with the trend – STEPAR method—stepwise autoregressive models with constant, linear, or quadratic trend and autoregressive errors to any order – Holt-Winters forecasting method with constant, linear, or quadratic trend – additive variant of the Holt-Winters method support for up to three levels of seasonality for Holt-Winters method: time-of-year, day-ofweek, or time-of-day ability to forecast any number of variables at once forecast confidence limits for all methods
Time Series Interpolation and Frequency Conversion The EXPAND procedure provides time interval conversion and missing value interpolation for time series. The EXPAND procedure includes the following features: conversion of time series frequency; for example, constructing quarterly estimates from annual series or aggregating quarterly values to annual values conversion of irregular observations to periodic observations interpolation of missing values in time series conversion of observation types; for example, estimate stocks from flows and vice versa. All possible conversions are supported between any of the following: – beginning of period – end of period – period midpoint – period total
40 F Chapter 2: Introduction
– period average conversion of time series phase shift; for example, conversion between fiscal years and calendar years identifying observations including the following: – identification of the time interval of the input values – validation of the input data set observations – computation of the ID values for the observations in the output data set choice of four interpolation methods: – cubic splines – linear splines – step functions – simple aggregation ability to perform extrapolation by a linear projection of the trend of the cubic spline curve fit to the input data ability to transform series before and after interpolation (or without interpolation) by using any of the following: – constant shift or scale – sign change or absolute value – logarithm, exponential, square root, square, logistic, inverse logistic – lags, leads, differences – classical decomposition – bounds, trims, reverse series – centered moving, cumulative, or backward moving average – centered moving, cumulative, or backward moving range – centered moving, cumulative, or backward moving geometric mean – centered moving, cumulative, or backward moving maximum – centered moving, cumulative, or backward moving median – centered moving, cumulative, or backward moving minimum – centered moving, cumulative, or backward moving product – centered moving, cumulative, or backward moving corrected sum of squares – centered moving, cumulative, or backward moving uncorrected sum of squares – centered moving, cumulative, or backward moving rank – centered moving, cumulative, or backward moving standard deviation – centered moving, cumulative, or backward moving sum – centered moving, cumulative, or backward moving median
Trend and Seasonal Analysis on Transaction Databases F 41
– centered moving, cumulative, or backward moving t-value – centered moving, cumulative, or backward moving variance support for a wide range of time series frequencies: – YEAR – SEMIYEAR – QUARTER – MONTH – SEMIMONTH – TENDAY – WEEK – WEEKDAY – DAY – HOUR – MINUTE – SECOND support for repeating of shifting the basic interval types to define a great variety of different frequencies, such as fiscal years, biennial periods, work shifts, and so forth Refer to Chapter 3, “Working with Time Series Data,” and Chapter 4, “Date Intervals, Formats, and Functions,” for more information about time series data transformations.
Trend and Seasonal Analysis on Transaction Databases The TIMESERIES procedure can accumulate transactional data to time series and perform trend and seasonal analysis on the accumulated time series. Time series analyses performed by the TIMESERIES procedure include the follows: descriptive statistics relevant for time series data seasonal decomposition and seasonal adjustment analysis correlation analysis cross-correlation analysis The TIMESERIES procedure includes the following features: ability to process large amounts of time-stamped transactional data
42 F Chapter 2: Introduction
statistical methods useful for large-scale time series analysis or (temporal) data mining output data sets stored in either a time series format (default) or a coordinate format (transposed) The TIMESERIES procedure is normally used to prepare data for subsequent analysis that uses other SAS/ETS procedures or other parts of the SAS system. The time series format is most useful when the data are to be analyzed with SAS/ETS procedures. The coordinate format is most useful when the data are to be analyzed with SAS/STAT® procedures or SAS Enterprise MinerTM . (For example, clustering time-stamped transactional data can be achieved by using the results of TIMESERIES procedure with the clustering procedures of SAS/STAT and the nodes of SAS Enterprise Miner.)
Access to Financial and Economic Databases The DATASOURCE procedure and the SAS/ETS data access interface LIBNAME Engines (SASECRSP, SASEFAME and SASEHAVR) provide seamless, efficient access to time series data from data files supplied by a variety of commercial and governmental data vendors. The DATASOURCE procedure includes the following features: support for data files distributed by the following data vendors: – DRI/McGraw-Hill – FAME Information Services – HAVER ANALYTICS – Standard & Poors Compustat Service – Center for Research in Security Prices (CRSP) – International Monetary Fund – U.S. Bureau of Labor Statistics – U.S. Bureau of Economic Analysis – Organization for Economic Cooperation and Development (OECD) ability to select the series, frequency, time range, and cross sections of extracted data ability to create an output data set containing descriptive information on the series available in the data file ability to read EBCDIC data on ASCII systems and vice versa The SASECRSP interface LIBNAME engine includes the following features: enables random access to time series data residing in CRSPAccess databases provides a seamless interface between CRSP and SAS data processing
Access to Financial and Economic Databases F 43
uses the LIBNAME statement to enable you to specify which time series you would like to read from the CRSPAccess database, and how you would like to perform selection enables you access to CRSP Stock, CRSP/COMPUSTAT Merged (CCM) or CRSP Indices Data. provides convenient formats, informats, and functions for CRSP and SAS datetime conversions The SASEFAME interface LIBNAME engine includes the following features: provides SAS and FAME users flexibility in accessing and processing time series data, case series, and formulas that reside in either a FAME database or a SAS data set provides a seamless interface between FAME and SAS data processing uses the LIBNAME statement to enable you to specify which time series you would like to read from the FAME database enables you to convert the selected time series to the same time scale works with the SAS DATA step to perform further subsetting and to store the resulting time series into a SAS data set performs more analysis if desired either in the same SAS session or in another session at a later time supports the FAME CROSSLIST function for subsetting via BYGROUPS using the CROSSLIST= option – you can use a FAME namelist that contains your BY variables for selection in the CROSSLIST – you can use a SAS input dataset, INSET, that contains the BY selection variables along with the WHERE= option in your SASEFAME libref supports the use of FAME in a client/server environment that uses the FAME CHLI capability on your FAME server enables access to your FAME remote data when you specify the port number of the TCP/IP service that is defined for your FAME server and the node name of your FAME master server in your SASEFAME libref’s physical path The SASEHAVR interface LIBNAME engine includes the following features: enables Windows users random access to economic and financial data residing in a HAVER ANALYTICS Data Link Express (DLX) database the following types of HAVER data sets are available: – United States Economic Indicators – Specialized Databases
44 F Chapter 2: Introduction
– Financial Indicators – Industry – Industrial Countries – Emerging Markets – International Organizations – Forecasts and As Reported Data – United States Regional enables you to limit the range of data that is read from the time series enables you to specify a desired conversion frequency. Start dates are recommended on the LIBNAME statement to help you save resources when processing large databases or when processing a large number of observations. enables you to use the WHERE, KEEP, or DROP statements in your DATA step to further subset your data supports use of the SQL procedure to create a view of your resulting SAS data set
Spreadsheet Calculations and Financial Report Generation The COMPUTAB procedure generates tabular reports using a programmable data table. The COMPUTAB procedure is especially useful when you need both the power of a programmable spreadsheet and a report-generation system and you want to set up a program to run in batch mode and generate routine reports. The COMPUTAB procedure includes the following features: report generation facility for creating tabular reports such as income statements, balance sheets, and other row and column reports for analyzing business or time series data ability to tailor report format to almost any desired specification use of the SAS programming language to provide complete control of the calculation and format of each item of the report ability to report definition in terms of a data table on which programming statements operate ability for a single reference to a row or column to bring the entire row or column into a calculation ability to create new rows and columns (such as totals, subtotals, and ratios) with a single programming statement access to individual table values when needed built-in features to provide consolidation reports over summarization variables
Loan Analysis, Comparison, and Amortization F 45
Loan Analysis, Comparison, and Amortization The LOAN procedure provides analysis and comparison of mortgages and other installment loans; it includes the following features: ability to specify contract terms for any number of different loans and ability to analyze and compare various financing alternatives analysis of four different types of loan contracts including the following: – fixed rate – adjustable rate – buy-down rate – balloon payment full control over adjustment terms for adjustable rate loans: life caps, adjustment frequency, and maximum and minimum rates support for a wide variety of payment and compounding intervals ability to incorporate initialization costs, discount points, down payments, and prepayments (uniform or lump-sum) in loan calculations analysis of different rate adjustment scenarios for variable rate loans including the following: – worst case – best case – fixed rate case – estimated case ability to make loan comparisons at different points in time ability to make loan comparisons at each analysis date on the basis of five different economic criteria: – present worth of cost (net present value of all payments to date) – true interest rate (internal rate of return to date) – current periodic payment – total interest paid to date – outstanding balance ability to base loan comparisons on either after-tax or before-tax analysis report of the best alternative when loans of equal amount are compared amortization schedules for each loan contract
46 F Chapter 2: Introduction
output that shows payment dates, rather than just payment sequence numbers, when starting date is specified optional printing or output of the amortization schedules, loan summaries, and loan comparison information to SAS data sets ability to specify rounding of payments to any number of decimal places
Time Series Forecasting System SAS/ETS software includes the Time Series Forecasting System, a point-and-click application for exploring and analyzing univariate time series data. You can use the automatic model selection facility to select the best-fitting model for each time series, or you can use the system’s diagnostic features and time series modeling tools interactively to develop forecasting models customized to best predict your time series. The system provides both graphical and statistical features to help you choose the best forecasting method for each series. The system can be invoked by selecting AnalysisISolutions, by the FORECAST command, and by clicking the Forecasting icon in the Data Analysis folder of the SAS Desktop. The following is a brief summary of the features of the Time Series Forecasting system. With the system you can: use a wide variety of forecasting methods, including several kinds of exponential smoothing models, Winters method, and ARIMA (Box-Jenkins) models. You can also produce forecasts by combining the forecasts from several models. use predictor variables in forecasting models. Forecasting models can include time trend curves, regressors, intervention effects (dummy variables), adjustments you specify, and dynamic regression (transfer function) models. view plots of the data, predicted versus actual values, prediction errors, and forecasts with confidence limits. You can plot changes or transformations of series, zoom in on parts of the graphs, or plot autocorrelations. use hold-out samples to select the best forecasting method compare goodness-of-fit measures for any two forecasting models side-by-side or list all models sorted by a particular fit statistic view the predictions and errors for each model in a spreadsheet or view and compare the forecasts from any two models in a spreadsheet examine the fitted parameters of each forecasting model and their statistical significance control the automatic model selection process: the set of forecasting models considered, the goodness-of-fit measure used to select the best model, and the time period used to fit and evaluate models
Investment Analysis System F 47
customize the system by adding forecasting models for the automatic model selection process and for point-and-click manual selection save your work in a project catalog print an audit trail of the forecasting process save and print system output including spreadsheets and graphs
Investment Analysis System The Investment Analysis System is an interactive environment for analyzing the time-value of money for a variety of investments: loans savings depreciations bonds generic cash flows Various tools are provided to help analyze the value of investment alternatives: time value, periodic equivalent, internal rate of return, benefit-cost ratio, and breakeven analysis. These analyses can help answer a number of questions you might have about your investments: Which option is more profitable or less costly? Is it better to buy or rent? Are the extra fees for refinancing at a lower interest rate justified? What is the balance of this account after saving this amount periodically for so many years? How much is legally tax-deductible? Is this a reasonable price? Investment Analysis can be beneficial to users in many industries for a variety of decisions: manufacturing: cost justification of automation or any capital investment, replacement analysis of major equipment, or economic comparison of alternative designs government: setting funds for services finance: investment analysis and portfolio management for fixed-income securities
48 F Chapter 2: Introduction
ODS Graphics Many SAS/ETS procedures produce graphical output using the SAS Output Delivery System (ODS). The ODS Graphics system provides several advantages: Plots and graphs are output objects in the Output Delivery System (ODS) and can be manipulated with ODS commands. There is no need to write SAS/GRAPH statements or use special plotting macros. There are multiple formats to choose from: html, gif, and rtf. Templates control the appearance of plots. Styles control the color scheme. You can edit or create templates and styles for all graphs. To enable graphical output from SAS/ETS procedures, you must use the following statement in your SAS program. ods graphics on;
The graphical output produced by many SAS/ETS procedures can be controlled using the PLOTS= option on the PROC statement. For more information about the features of the ODS Graphics system, including the many ways that you can control or customize the plots produced by SAS procedures, refer to Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide). For more information about the SAS Output Delivery system, refer to the SAS Output Delivery System: User’s Guide.
Related SAS Software Many features not found in SAS/ETS software are available in other parts of the SAS System, such as Base SAS®, SAS® Forecast Server, SAS/STAT® software, SAS/OR® software, SAS/QC® software, SAS® Stat Studio, and SAS/IML® software. If you do not find something you need in SAS/ETS software, you might be able to find it in SAS/STAT software and in Base SAS software. If you still do not find it, look in other SAS software products or contact SAS Technical Support staff. The following subsections summarize the features of other SAS products that might be of interest to users of SAS/ETS software.
Base SAS Software F 49
Base SAS Software The features provided by SAS/ETS software are extensions to the features provided by Base SAS software. Many data management and reporting capabilities you need are part of Base SAS software. Refer to SAS Language Reference: Dictionary and Base SAS Procedures Guide for documentation of Base SAS software. In particular, refer to Base SAS Procedures Guide: Statistical Procedures for information about statistical analysis features included with Base SAS. The following sections summarize Base SAS software features of interest to users of SAS/ETS software. See Chapter 3, “Working with Time Series Data,” for further discussion of some of these topics as they relate to time series data and SAS/ETS software.
SAS DATA Step The DATA step is your primary tool for reading and processing data in the SAS System. The DATA step provides a powerful general purpose programming language that enables you to perform all kinds of data processing tasks. The DATA step is documented in SAS Language Reference: Dictionary.
Base SAS Procedures Base SAS software includes many useful SAS procedures, which are documented in Base SAS Procedures Guide and Base SAS Procedures Guide: Statistical Procedures. The following is a list of Base SAS procedures you might find useful: CATALOG
for managing SAS catalogs
CHART
for printing charts and histograms
COMPARE
for comparing SAS data sets
CONTENTS
for displaying the contents of SAS data sets
COPY
for copying SAS data sets
CORR
for computing correlations
CPORT
for moving SAS data libraries between computer systems
DATASETS
for deleting or renaming SAS data sets
FCMP
for compiling functions for use in SAS programs. The SAS Function Compiler Procedure (FCMP) enables you to create, test, and store SAS functions and subroutines before you use them in other SAS procedures. PROC FCMP accepts slight variations of DATA step statements, and most features of the SAS programming language can be used in functions and subroutines that are processed by PROC FCMP.
FREQ
for computing frequency crosstabulations
MEANS
for computing descriptive statistics and summarizing or collapsing data over cross sections
50 F Chapter 2: Introduction
PLOT
for printing scatter plots
PRINT
for printing SAS data sets
PROTO
for accessing external functions from the SAS system. The PROTO procedure enables you to register external functions that are written in the C or C++ programming languages. You can use these functions in SAS as well as in C-language structures and types. After the C-language functions are registered in PROC PROTO, they can be called from any SAS function or subroutine that is declared in the FCMP procedure, as well as from any SAS function, subroutine, or method block that is declared in the COMPILE procedure.
RANK
for computing rankings or order statistics
SORT
for sorting SAS data sets
SQL
for processing SAS data sets with Structured Query Language
STANDARD
for standardizing variables to a fixed mean and variance
TABULATE
for printing descriptive statistics in tabular format
TIMEPLOT
for plotting variables over time
TRANSPOSE
for transposing SAS data sets
UNIVARIATE
for computing descriptive statistics
Global Statements Global statements can be specified anywhere in your SAS program, and they remain in effect until changed. Global statements are documented in SAS Language Reference: Dictionary. You may find the following SAS global statements useful: FILENAME
for accessing data files
FOOTNOTE
for printing footnote lines at the bottom of each page
%INCLUDE
for including files of SAS statements
LIBNAME
for accessing SAS data libraries
OPTIONS
for setting various SAS system options
QUIT
for ending an interactive procedure step
RUN
for executing the preceding SAS statements
TITLE
for printing title lines at the top of each page
X
for issuing host operating system commands from within your SAS session
Some Base SAS statements can be used with any SAS procedure, including SAS/ETS procedures. These statements are not global, and they affect only the SAS procedure they are used with. These statements are documented in SAS Language Reference: Dictionary. The following Base SAS statements are useful with SAS/ETS procedures: BY
for computing separate analyses for groups of observations
SAS Forecast Studio F 51
FORMAT
for assigning formats to variables
LABEL
for assigning descriptive labels to variables
WHERE
for subsetting data to restrict the range of data processed or to select or exclude observations from the analysis
SAS Functions SAS functions can be used in DATA step programs and in the COMPUTAB and MODEL procedures. The following kinds of functions are available: character functions for manipulating character strings date and time functions for performing date and calendar calculations financial functions for performing financial calculations such as depreciation, net present value, periodic savings, and internal rate of return lagging and differencing functions for computing lags and differences mathematical functions for computing data transformations and other mathematical calculations probability functions for computing quantiles of statistical distributions and the significance of test statistics random number functions for simulation experiments sample statistics functions for computing means, standard deviations, kurtosis, and so forth SAS functions are documented in SAS Language Reference: Dictionary. Chapter 3, “Working with Time Series Data,” discusses the use of date, time, lagging, and differencing functions. Chapter 4, “Date Intervals, Formats, and Functions,” contains a reference list of date and time functions.
Formats, Informats, and Time Intervals Base SAS software provides formats to control the printing of data values, informats to read data values, and time intervals to define the frequency of time series. See Chapter 4, “Date Intervals, Formats, and Functions,” for more information.
SAS Forecast Studio SAS Forecast Studio is part of the SAS Forecast Server product. It provides an interactive environment for modeling and forecasting very large collections of hierarchically organized time series, such as SKUs in product lines and sales regions of a retail business. Forecast Studio greatly extends the
52 F Chapter 2: Introduction
capabilities provided by the Time Series Forecasting System included with SAS/ETS and described in Part IV. Forecast Studio is documented in SAS Forecast Server User’s Guide.
SAS High-Performance Forecasting SAS High-Performance Forecasting (HPF) software provides a system of SAS procedures for largescale automatic forecasting in business, government, and academic applications. Major uses of High-Performance Forecasting procedures include: forecasting, forecast scoring, market response modeling, and time series data mining. The software includes the following automatic forecasting process: accumulates the time-stamped data to form a fixed-interval time series diagnoses the time series using time series analysis techniques creates a list of candidate model specifications based on the diagnostics fits each candidate model specification to the time series generates forecasts for each candidate fitted model selects the most appropriate model specification based on either in-sample or holdout-sample evaluation using a model selection criterion refits the selected model specification to the entire range of the time series creates a forecast score from the selected fitted model generate forecasts from the forecast score evaluates the forecast using in-sample analysis provides for out-of-sample forecast performance analysis performs top-down, middle-out, or bottom-up reconciliations of forecasts in the hierarchy
SAS/GRAPH Software SAS/GRAPH software includes procedures that create two- and three-dimensional high resolution color graphics plots and charts. You can generate output that graphs the relationship of data values to one another, enhance existing graphs, or simply create graphics output that is not tied to data. With the addition of ODS Graphics features to SAS/ETS procedures, there is now less need for the use of SAS/GRAPH procedures with SAS/ETS. However, SAS/GRAPH procedures allow you to create additional graphical displays of your results.
SAS/STAT Software F 53
SAS/GRAPH software can produce the following types of output: charts plots maps text three-dimensional graphs With SAS/GRAPH software you can produce high-resolution color graphics plots of time series data.
SAS/STAT Software SAS/STAT software is of interest to users of SAS/ETS software because many econometric and other statistical methods not included in SAS/ETS software are provided in SAS/STAT software. SAS/STAT software includes procedures for a wide range of statistical methodologies including the following: logistic regression censored regression principal component analysis structural equation models using covariance structure analysis factor analysis survival analysis discriminant analysis cluster analysis categorical data analysis; log-linear and conditional logistic models general linear models mixed linear and nonlinear models generalized linear models response surface analysis kernel density estimation LOESS regression
54 F Chapter 2: Introduction
spline regression two-dimensional kriging multiple imputation for missing values survey data analysis
SAS/IML Software SAS/IML software gives you access to a powerful and flexible programming language (Interactive Matrix Language) in a dynamic, interactive environment. The fundamental object of the language is a data matrix. You can use SAS/IML software interactively (at the statement level) to see results immediately, or you can store statements in a module and execute them later. The programming is dynamic because necessary activities such as memory allocation and dimensioning of matrices are done automatically. You can access built-in operators and call routines to perform complex tasks such as matrix inversion or eigenvector generation. You can define your own functions and subroutines using SAS/IML modules. You can perform operations on an entire data matrix. You have access to a wide choice of data management commands. You can read, create, and update SAS data sets from inside SAS/IML software without ever using the DATA step. SAS/IML software is of interest to users of SAS/ETS software because it enables you to program your own econometric and time series methods in the SAS System. It contains subroutines for time series operators and for general function optimization. If you need to perform a statistical calculation not provided as an automated feature by SAS/ETS or other SAS software, you can use SAS/IML software to program the matrix equations for the calculation.
Kalman Filtering and Time Series Analysis in SAS/IML SAS/IML software includes CALL routines and functions for Kalman filtering and time series analysis, which perform the following: generate univariate, multivariate, and fractional time series compute likelihood function of ARMA, VARMA, and ARFIMA models compute an autocovariance function of ARMA, VARMA, and ARFIMA models check the stationarity of ARMA and VARMA models filter and smooth time series models using Kalman method fit AR, periodic AR, time-varying coefficient AR, VAR, and ARFIMA models handle Bayesian seasonal adjustment models
SAS/IML Stat Studio F 55
SAS/IML Stat Studio SAS/IML Studio is a highly interactive tool for data exploration and analysis. SAS/IML Studio runs on a PC in the Microsoft Windows operating environment. You can use SAS/IML Studio to do the following: explore data through graphs linked across multiple windows transform data subset data analyze univariate distributions discover structure and features in multivariate data fit and evaluate explanatory models create your own customized statistical graphics add legends, curves, maps, or other custom features to statistical graphics develop interactive programs that use dialog boxes extend the built-in analyses by calling SAS procedures create custom analyses repeat an analysis on different data extend the results of SAS procedures by using IML share analyses with colleagues who also use SAS/IML Studio call functions from libraries written in R, C/C++, FORTRAN, or Java See SAS/IML Studio User’s Guide for more information.
SAS/OR Software SAS/OR software provides SAS procedures for operations research and project planning and includes a menu driven system for project management. SAS/OR software has features for the following: solving transportation problems linear, integer, and mixed-integer programming nonlinear programming and optimization
56 F Chapter 2: Introduction
scheduling projects plotting Gantt charts drawing network diagrams solving optimal assignment problems network flow programming SAS/OR software might be of interest to users of SAS/ETS software for its mathematical programming features. In particular, the NLP and OPTMODEL procedures in SAS/OR software solve nonlinear programming problems and can be used for constrained and unconstrained maximization of user-defined likelihood functions. See SAS/OR User’s Guide: Mathematical Programming for more information.
SAS/QC Software SAS/QC software provides a variety of procedures for statistical quality control and quality improvement. SAS/QC software includes procedures for the following: Shewhart control charts cumulative sum control charts moving average control charts process capability analysis Ishikawa diagrams Pareto charts experimental design SAS/QC software also includes the SQC menu system for interactive application of statistical quality control methods and the ADX Interface for experimental design.
MLE for User-Defined Likelihood Functions There are several SAS procedures that enable you to do maximum likelihood estimation of parameters in an arbitrary model with a likelihood function that you define: PROC MODEL, PROC NLP, PROC OPTMODEL and PROC IML.
JMP Software F 57
The MODEL procedure in SAS/ETS software enables you to minimize general log-likelihood functions for the error term of a model. The NLP and OPTMODEL procedures in SAS/OR software are general nonlinear programming procedures that can maximize a general function subject to linear equality or inequality constraints. You can use PROC NLP or OPTMODEL to maximize a user-defined nonlinear likelihood function. You can use the IML procedure in SAS/IML software for maximum likelihood problems. The optimization routines used by PROC NLP are available through IML subroutines. You can write the likelihood function in the SAS/IML matrix language and call the constrained and unconstrained nonlinear programming subroutines to maximize the likelihood function with respect to the parameter vector.
®
JMP Software JMP software uses a flexible graphical interface to display and analyze data. JMP dynamically links statistics and graphics so you can easily explore data, make discoveries, and gain the knowledge you need to make better decisions. JMP provides a comprehensive set of statistical tools as well as design of experiments (DOE) and advanced quality control (QC and SPC) tools for Six Sigma in a single package. JMP is software for interactive statistical graphics and includes: a data table window for editing, entering, and manipulating data a broad range of graphical and statistical methods for data analysis a facility for grouping data and computing summary statistics JMP scripting language (JSL)—a scripting language for saving and creating frequently used routines JMP automation Formula Editor—a formula editor for each table column to compute values as needed linear models, correlations, and multivariate design of experiments module options to highlight and display subsets of data statistical quality control and variability charts—special plots, charts, and communication capability for quality-improvement techniques survival analysis time series analysis, which includes the following: – Box-Jenkins ARIMA forecasting – seasonal ARIMA forecasting
58 F Chapter 2: Introduction
– transfer function modeling – smoothing models: Winters method, single, double, linear, damped trend linear, and seasonal exponential smoothing – diagnostic charts (autocorrelation, partial autocorrelation, and variogram) and statistics of fit – a model comparison table to compare all forecasts generated – spectral density plots and white noise tests tools for printing and for moving analyses results between applications
®
SAS Enterprise Guide
SAS Enterprise Guide has the following features: integration with the SAS9 platform: – open metadata repository (OMR) integration – SAS report integration create report interface ODS support Web report studio integration – access to information maps – ETL studio impact analysis – ESRI integration within the OLAP analyzer – data mining scoring task the user interface and workflow – process flow – ability to create stored processes from process flows – SAS folders window – project parameters – query builder interface – code node – OLAP analyzer
ESRI integration tree-diagram-based OLAP explorer SAS report snapshots SAS Web OLAP viewer for .NET ability to create EG projects
– workspace maximization
SAS Add-In for Microsoft Office F 59
With Enterprise Guide, you can perform time series analysis with the following EG procedures: prepare time series data—the Prepare Time Series Data task can be used to make data more suitable for analysis by other time series tasks. create time series data—the Create Time Series Data wizard helps you convert transactional data into fixed-interval time series. Transactional data are time-stamped data collected over time with irregular or varied frequency. ARIMA Modeling and Forecasting task Basic Forecasting task Regression Analysis with Autoregressive Errors Regression Analysis of Panel Data
®
SAS Add-In for Microsoft Office The main time series tasks in SAS Add-in for Microsoft Office (AMO) are as follows: Prepare Time Series Data Basic Forecasting ARIMA Modeling and Forecasting Regression Analysis with Autoregressive Errors Regression Analysis of Panel Data Create Time Series Data Forecast Studio Create Project Forecast Studio Open Project Forecast Studio Submit Overrides
SAS Enterprise MinerTM —Time Series Node SAS Enterprise MinerTM is the SAS solution for data mining, streamlining the data mining process to create highly accurate predictive and descriptive models. Enterprise Miner’s process flow diagram eliminates the need for manual coding and reduces the model development time for both business analysts and statisticians. The system is customizable and extensible; users can integrate their code and build new nodes for redistribution.
60 F Chapter 2: Introduction
The Time Series node is a method of investigating time series data. It belongs to the Modify category of the SAS SEMMA (sample, explore, modify, model, assess) data mining process. The Time Series node enables you to understand trends and seasonal variation in large amounts of time series and transactional data. The Time Series node in SAS Enterprise Miner enables you to do the following: perform time series analysis perform forecasting work with transactional data
SAS Risk Products The SAS Risk products include SAS Risk Dimensions®, SAS Credit Risk Management for Banking, SAS OpRisk VaR, and SAS OpRisk Monitor. The analytical methods of SAS Risk Dimensions measure market risk and credit risk. SAS Risk Dimensions creates an environment where market and position data are staged for analysis using SAS data access and warehousing methodologies. SAS Risk Dimensions delivers a full range of modern credit, market and operational risk analysis techniques including: mark-to-market scenario analysis profit/loss curves and surfaces sensitivity analysis delta normal VaR historical simulation VaR Monte Carlo VaR current exposure potential exposure credit VaR optimization SAS Credit Risk Management for Banking is a complete end-to-end application for measuring, exploring, managing, and reporting credit risk. SAS Credit Risk Management for Banking integrates data access, mapping, enrichment, and aggregation with advanced analytics and flexible reporting, all in an open, extensible, client-server framework. SAS Credit Risk Management for Banking enables you to do the following:
References F 61
access and aggregate credit risk data across disparate operating systems and sources seamlessly integrate credit scoring/internal rating with credit portfolio risk assessment accurately measure, monitor, and report potential credit risk exposures within entities of an organization and aggregated across the entire organization, both on the counterparty level and the portfolio level evaluate alternative strategies for pricing, hedging, or transferring credit risk optimize the allocation of credit risk mitigants or assign the mitigants to lower the regulatory capital requirement optimize the allocation of regulatory capital and economic capital facilitate regulatory compliance and risk disclosure requirements for a wide variety of regulations such as Basel I, Basel II, and the Capital Requirements Directive (CAD III)
References Amal, S. and Weselowski, R. (1993), “Practical Econometric Analysis for Assessment of Real Property: Using the SAS System on Personal Computers,” Proceedings of the Eighteenth Annual SAS Users Group International Conference, 385-390. Cary, NC: SAS Institute Inc. Benseman, B. (1990), “Better Forecasting with SAS/ETS Software,” Proceedings of the Fifteenth Annual SAS Users Group International Conference, 494-497. Cary, NC: SAS Institute Inc. Calise, A. and Earley, J. (1997), “Forecasting College Enrollment Using the SAS System,” Proceedings of the Twenty-Second Annual SAS Users Group International Conference, 1326-1329. Cary, NC: SAS Institute Inc. Early, J., Sweeney, J., and Zekavat, S. M. (1989), “PROC ARIMA and the Dow Jones Stock Index,” Proceedings of the Fourteenth Annual SAS Users Group International Conference, 371-375. Cary, NC: SAS Institute Inc. Fischetti, T., Heathcote, S. and Perry, D. (1993), “Using SAS to Create a Modular Forecasting System,” Proceedings of the Eighteenth Annual SAS Users Group International Conference, 580-585. Cary, NC: SAS Institute Inc. Fleming, N. S., Gibson, E. and Fleming, D. G. (1996), “The Use of PROC ARIMA to Test an Intervention Effect,” Proceedings of the Twenty-First Annual SAS Users Group International Conference, 1317-1326. Cary, NC: SAS Institute Inc. Hisnanick, J. J. (1991), “Evaluating Input Separability in a Model of the U.S. Manufacturing Sector,” Proceedings of the Sixteenth Annual SAS Users Group International Conference, 688-693. Cary, NC: SAS Institute Inc.
62 F Chapter 2: Introduction
Hisnanick, J. J. (1992), “Using PROC ARIMA in Forecasting the Demand and Utilization of Inpatient Hospital Services,” Proceedings of the Seventeenth Annual SAS Users Group International Conference, 383-391. Cary, NC: SAS Institute Inc. Hisnanick, J. J. (1993), “Using SAS/ETS in Applied Econometrics: Parameters Estimates for the CES-Translog Specification,” Proceedings of the Eighteenth Annual SAS Users Group International Conference, 275-279. Cary, NC: SAS Institute Inc. Hoyer, K. K. and Gross, K. C. (1993), “Spectral Decomposition and Reconstruction of Nuclear Plant Signals,” Proceedings of the Eighteenth Annual SAS Users Group International Conference, 1153-1158. Cary, NC: SAS Institute Inc. Keshani, D. A. and Taylor, T. N. (1992), “Weather Sensitive Appliance Load Curves; Conditional Demand Estimation,” Proceedings of the Annual SAS Users Group International Conference, 422430. Cary, NC: SAS Institute Inc. Khan, M. H. (1990), “Transfer Function Model for Gloss Prediction of Coated Aluminum Using the ARIMA Procedure,” Proceedings of the Fifteenth Annual SAS Users Group International Conference, 517-522. Cary, NC: SAS Institute Inc. Le Bouton, K. J. (1989), “Performance Function for Aircraft Production Using PROC SYSLIN and L2 Norm Estimation,” Proceedings of the Fourteenth Annual SAS Users Group International Conference, 424-426. Cary, NC: SAS Institute Inc. Lin, L. and Myers, S. C. (1988), “Forecasting the Economy using the Composite Leading Index, Its Components, and a Rational Expectations Alternative,” Proceedings of the Thirteenth Annual SAS Users Group International Conference, 181-186. Cary, NC: SAS Institute Inc. McCarty, L. (1994), “Forecasting Operational Indices Using SAS/ETS Software,” Proceedings of the Nineteenth Annual SAS Users Group International Conference, 844-848. Cary, NC: SAS Institute Inc. Morelock, M. M., Pargellis, C. A., Graham, E. T., Lamarre, D., and Jung, G. (1995), “Time-Resolved Ligand Exchange Reactions: Kinetic Models for Competitive Inhibitors with Recombinant Human Renin,” Journal of Medical Chemistry, 38, 1751–1761. Parresol, B. R. and Thomas, C. E. (1991), “Econometric Modeling of Sweetgum Stem Biomass Using the IML and SYSLIN Procedures,” Proceedings of the Sixteenth Annual SAS Users Group International Conference, 694-699. Cary, NC: SAS Institute Inc.
Chapter 3
Working with Time Series Data Contents Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time Series and SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reading a Simple Time Series . . . . . . . . . . . . . . . . . . . . . . . . . Dating Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SAS Date, Datetime, and Time Values . . . . . . . . . . . . . . . . . . . . . Reading Date and Datetime Values with Informats . . . . . . . . . . . . . . Formatting Date and Datetime Values . . . . . . . . . . . . . . . . . . . . . The Variables DATE and DATETIME . . . . . . . . . . . . . . . . . . . . . . Sorting by Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subsetting Data and Selecting Observations . . . . . . . . . . . . . . . . . . . . . Subsetting SAS Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the WHERE Statement with SAS Procedures . . . . . . . . . . . . . Using SAS Data Set Options . . . . . . . . . . . . . . . . . . . . . . . . . . Storing Time Series in a SAS Data Set . . . . . . . . . . . . . . . . . . . . . . . . Standard Form of a Time Series Data Set . . . . . . . . . . . . . . . . . . . Several Series with Different Ranges . . . . . . . . . . . . . . . . . . . . . . Missing Values and Omitted Observations . . . . . . . . . . . . . . . . . . Cross-Sectional Dimensions and BY Groups . . . . . . . . . . . . . . . . . Interleaved Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . Output Data Sets of SAS/ETS Procedures . . . . . . . . . . . . . . . . . . . Time Series Periodicity and Time Intervals . . . . . . . . . . . . . . . . . . . . . Specifying Time Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Intervals with SAS/ETS Procedures . . . . . . . . . . . . . . . . . . Time Intervals, the Time Series Forecasting System, and the Time Series Viewer Plotting Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the Time Series Viewer . . . . . . . . . . . . . . . . . . . . . . . . . Using PROC SGPLOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using PROC PLOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using PROC TIMEPLOT . . . . . . . . . . . . . . . . . . . . . . . . . . . Using PROC GPLOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calendar and Time Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computing Dates from Calendar Variables . . . . . . . . . . . . . . . . . . Computing Calendar Variables from Dates . . . . . . . . . . . . . . . . . .
64 65 65 66 67 68 69 70 71 72 73 73 74 75 75 76 77 78 79 80 82 84 84 85 85 86 86 86 91 92 93 94 95 95
64 F Chapter 3: Working with Time Series Data
Converting between Date, Datetime, and Time Values . . Computing Datetime Values . . . . . . . . . . . . . . . . Computing Calendar and Time Variables . . . . . . . . . Interval Functions INTNX and INTCK . . . . . . . . . . . . . Incrementing Dates by Intervals . . . . . . . . . . . . . . Alignment of SAS Dates . . . . . . . . . . . . . . . . . . Computing the Width of a Time Interval . . . . . . . . . Computing the Ceiling of an Interval . . . . . . . . . . . Counting Time Intervals . . . . . . . . . . . . . . . . . . Checking Data Periodicity . . . . . . . . . . . . . . . . . Filling In Omitted Observations in a Time Series Data Set Using Interval Functions for Calendar Calculations . . . . Lags, Leads, Differences, and Summations . . . . . . . . . . . The LAG and DIF Functions . . . . . . . . . . . . . . . . Multiperiod Lags and Higher-Order Differencing . . . . . Percent Change Calculations . . . . . . . . . . . . . . . . Leading Series . . . . . . . . . . . . . . . . . . . . . . . Summing Series . . . . . . . . . . . . . . . . . . . . . . Transforming Time Series . . . . . . . . . . . . . . . . . . . . Log Transformation . . . . . . . . . . . . . . . . . . . . Other Transformations . . . . . . . . . . . . . . . . . . . The EXPAND Procedure and Data Transformations . . . Manipulating Time Series Data Sets . . . . . . . . . . . . . . . Splitting and Merging Data Sets . . . . . . . . . . . . . . Transposing Data Sets . . . . . . . . . . . . . . . . . . . Time Series Interpolation . . . . . . . . . . . . . . . . . . . . . Interpolating Missing Values . . . . . . . . . . . . . . . . Interpolating to a Higher or Lower Frequency . . . . . . . Interpolating between Stocks and Flows, Levels and Rates Reading Time Series Data . . . . . . . . . . . . . . . . . . . . Reading a Simple List of Values . . . . . . . . . . . . . . Reading Fully Described Time Series in Transposed Form
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
. .
.
. .
96 96 97 97 98 99 100 101 101 102 102 103 104 104 108 109 111 112 113 114 115 116 116 116 117 121 122 122 123 123 124 124
Overview This chapter discusses working with time series data in the SAS System. The following topics are included: dating time series and working with SAS date and datetime values subsetting data and selecting observations
Time Series and SAS Data Sets F 65
storing time series data in SAS data sets specifying time series periodicity and time intervals plotting time series using calendar and time interval functions computing lags and other functions across time transforming time series transposing time series data sets interpolating time series reading time series data recorded in different ways In general, this chapter focuses on using features of the SAS programming language and not on features of SAS/ETS software. However, since SAS/ETS procedures are used to analyze time series, understanding how to use the SAS programming language to work with time series data is important for the effective use of SAS/ETS software. You do not need to read this chapter to use SAS/ETS procedures. If you are already familiar with SAS programming you might want to skip this chapter, or you can refer to sections of this chapter for help on specific time series data processing questions.
Time Series and SAS Data Sets
Introduction To analyze data with the SAS System, data values must be stored in a SAS data set. A SAS data set is a matrix (or table) of data values organized into variables and observations. The variables in a SAS data set label the columns of the data matrix, and the observations in a SAS data set are the rows of the data matrix. You can also think of a SAS data set as a kind of file, with the observations representing records in the file and the variables representing fields in the records. (See SAS Language Reference: Concepts for more information about SAS data sets.) Usually, each observation represents the measurement of one or more variables for the individual subject or item observed. Often, the values of some of the variables in the data set are used to identify the individual subjects or items that the observations measure. These identifying variables are referred to as ID variables. For many kinds of statistical analysis, only relationships among the variables are of interest, and the identity of the observations does not matter. ID variables might not be relevant in such a case.
66 F Chapter 3: Working with Time Series Data
However, for time series data the identity and order of the observations are crucial. A time series is a set of observations made at a succession of equally spaced points in time. For example, if the data are monthly sales of a company’s product, the variable measured is sales of the product and the unit observed is the operation of the company during each month. These observations can be identified by year and month. If the data are quarterly gross national product, the variable measured is final goods production and the unit observed is the economy during each quarter. These observations can be identified by year and quarter. For time series data, the observations are identified and related to each other by their position in time. Since SAS does not assume any particular structure to the observations in a SAS data set, there are some special considerations needed when storing time series in a SAS data set. The main considerations are how to associate dates with the observations and how to structure the data set so that SAS/ETS procedures and other SAS procedures recognize the observations of the data set as constituting time series. These issues are discussed in following sections.
Reading a Simple Time Series Time series data can be recorded in many different ways. The section “Reading Time Series Data” on page 123 discusses some of the possibilities. The example below shows a simple case. The following SAS statements read monthly values of the U.S. Consumer Price Index for June 1990 through July 1991. The data set USCPI is shown in Figure 3.1. data uscpi; input year month cpi; datalines; 1990 6 129.9 1990 7 130.4 ... more lines ...
proc print data=uscpi; run;
Dating Observations F 67
Figure 3.1 Time Series Data Obs
year
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1990 1990 1990 1990 1990 1990 1990 1991 1991 1991 1991 1991 1991 1991
month 6 7 8 9 10 11 12 1 2 3 4 5 6 7
cpi 129.9 130.4 131.6 132.7 133.5 133.8 133.8 134.6 134.8 135.0 135.2 135.6 136.0 136.2
When a time series is stored in the manner shown by this example, the terms series and variable can be used interchangeably. There is one observation per row and one series/variable per column.
Dating Observations The SAS System supports special date, datetime, and time values, which make it easy to represent dates, perform calendar calculations, and identify the time period of observations in a data set. The preceding example uses the ID variables YEAR and MONTH to identify the time periods of the observations. For a quarterly data set, you might use YEAR and QTR as ID variables. A daily data set might have the ID variables YEAR, MONTH, and DAY. Clearly, it would be more convenient to have a single ID variable that could be used to identify the time period of observations, regardless of their frequency. The following section, “SAS Date, Datetime, and Time Values” on page 68, discusses how the SAS System represents dates and times internally and how to specify date, datetime, and time values in a SAS program. The section “Reading Date and Datetime Values with Informats” on page 69 discusses how to read in date and time values from data records and how to control the display of date and datetime values in SAS output. Later sections discuss other issues concerning date and datetime values, specifying time intervals, data periodicity, and calendar calculations. SAS date and datetime values and the other features discussed in the following sections are also described in SAS Language Reference: Dictionary. Reference documentation on these features is also provided in Chapter 4, “Date Intervals, Formats, and Functions.”
68 F Chapter 3: Working with Time Series Data
SAS Date, Datetime, and Time Values SAS Date Values SAS software represents dates as the number of days since a reference date. The reference date, or date zero, used for SAS date values is 1 January 1960. For example, 3 February 1960 is represented by SAS as 33. The SAS date for 17 October 1991 is 11612. SAS software correctly represents dates from the year 1582 to the year 20,000. Dates represented in this way are called SAS date values. Any numeric variable in a SAS data set whose values represent dates in this way is called a SAS date variable. Representing dates as the number of days from a reference date makes it easy for the computer to store them and perform calendar calculations, but these numbers are not meaningful to users. However, you never have to use SAS date values directly, since SAS automatically converts between this internal representation and ordinary ways of expressing dates, provided that you indicate the format with which you want the date values to be displayed. (Formatting of date values is explained in the section “Formatting Date and Datetime Values” on page 70.)
Century of Dates Represented with Two-Digit Year Values SAS software informats, functions, and formats can process dates that are represented with twodigit year values. The century assumed for a two-digit year value can be controlled with the YEARCUTOFF= option in the OPTIONS statement. The YEARCUTOFF= system option controls how dates with two-digit year values are interpreted by specifying the first year of a 100-year span. The default value for the YEARCUTOFF= option is 1920. Thus by default the year ‘17’ is interpreted as 2017, while the year ‘25’ is interpreted as 1925. (See SAS Language Reference: Dictionary for more information about YEARCUTOFF=.)
SAS Date Constants SAS date values are written in a SAS program by placing the dates in single quotes followed by a D. The date is represented by the day of the month, the three letter abbreviation of the month name, and the year. For example, SAS reads the value ‘17OCT1991’D the same as 11612, the SAS date value for 17 October 1991. Thus, the following SAS statements print DATE=11612: data _null_; date = '17oct1991'd; put date=; run;
The year value can be given with two or four digits, so ‘17OCT91’D is the same as ‘17OCT1991’D.
Reading Date and Datetime Values with Informats F 69
SAS Datetime Values and Datetime Constants To represent both the time of day and the date, SAS uses datetime values. SAS datetime values represent the date and time as the number of seconds the time is from a reference time. The reference time, or time zero, used for SAS datetime values is midnight, 1 January 1960. Thus, for example, the SAS datetime value for 17 October 1991 at 2:45 in the afternoon is 1003329900. To specify datetime constants in a SAS program, write the date and time in single quotes followed by DT. To write the date and time in a SAS datetime constant, write the date part using the same syntax as for date constants, and follow the date part with the hours, the minutes, and the seconds, separating the parts with colons. The seconds are optional. For example, in a SAS program you would write 17 October 1991 at 2:45 in the afternoon as ‘17OCT91:14:45’DT. SAS reads this as 1003329900. Table 3.1 shows some other examples of datetime constants. Table 3.1
Examples of Datetime Constants
Datetime Constant ‘17OCT1991:14:45:32’DT ‘17OCT1991:12:5’DT ‘17OCT1991:2:0’DT ‘17OCT1991:0:0’DT
Time 32 seconds past 2:45 p.m., 17 October 1991 12:05 p.m., 17 October 1991 2:00 a.m., 17 October 1991 midnight, 17 October 1991
SAS Time Values The SAS System also supports time values. SAS time values are just like datetime values, except that the date part is not given. To write a time value in a SAS program, write the time the same as for a datetime constant, but use T instead of DT. For example, 2:45:32 p.m. is written ‘14:45:32’T. Time values are represented by a number of seconds since midnight, so SAS reads ‘14:45:32’T as 53132. SAS time values are not very useful for identifying time series, since usually both the date and the time of day are needed. Time values are not discussed further in this book.
Reading Date and Datetime Values with Informats SAS provides a selection of informats for reading SAS date and datetime values from date and time values recorded in ordinary notations. A SAS informat is an instruction that converts the values from a character-string representation into the internal numerical value of a SAS variable. Date informats convert dates from ordinary notations used to enter them to SAS date values; datetime informats convert date and time from ordinary notation to SAS datetime values. For example, the following SAS statements read monthly values of the U.S. Consumer Price Index. Since the data are monthly, you could identify the date with the variables YEAR and MONTH, as in
70 F Chapter 3: Working with Time Series Data
the previous example. Instead, in this example the time periods are coded as a three-letter month abbreviation followed by the year. The informat MONYY. is used to read month-year dates coded this way and to express them as SAS date values for the first day of the month, as follows: data uscpi; input date : monyy7. cpi; format date monyy7.; label cpi = "US Consumer Price Index"; datalines; jun1990 129.9 jul1990 130.4 ... more lines ...
The SAS System provides informats for most common notations for dates and times. See Chapter 4 for more information about the date and datetime informats available.
Formatting Date and Datetime Values SAS provides formats to convert the internal representation of date and datetime values used by SAS to ordinary notations for dates and times. Several different formats are available for displaying dates and datetime values in most of the commonly used notations. A SAS format is an instruction that converts the internal numerical value of a SAS variable to a character string that can be printed or displayed. Date formats convert SAS date values to a readable form; datetime formats convert SAS datetime values to a readable form. In the preceding example, the variable DATE was set to the SAS date value for the first day of the month for each observation. If the data set USCPI were printed or otherwise displayed, the values shown for DATE would be the number of days since 1 January 1960. (See the “DATE with no format” column in Figure 3.2.) To display date values appropriately, use the FORMAT statement. The following example processes the data set USCPI to make several copies of the variable DATE and uses a FORMAT statement to give different formats to these copies. The format cases shown are the MONYY7. format (for the DATE variable), the DATE9. format (for the DATE1 variable), and no format (for the DATE0 variable). The PROC PRINT output in Figure 3.2 shows the effect of the different formats on how the date values are printed. data fmttest; set uscpi; date0 = date; date1 = date; label date = "DATE date1 = "DATE date0 = "DATE format date monyy7. run;
with MONYY7. format" with DATE9. format" with no format"; date1 date9.;
proc print data=fmttest label;
The Variables DATE and DATETIME F 71
run;
Figure 3.2 SAS Date Values Printed with Different Formats
Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14
DATE with MONYY7. format
US Consumer Price Index
DATE with no format
DATE with DATE9. format
JUN1990 JUL1990 AUG1990 SEP1990 OCT1990 NOV1990 DEC1990 JAN1991 FEB1991 MAR1991 APR1991 MAY1991 JUN1991 JUL1991
129.9 130.4 131.6 132.7 133.5 133.8 133.8 134.6 134.8 135.0 135.2 135.6 136.0 136.2
11109 11139 11170 11201 11231 11262 11292 11323 11354 11382 11413 11443 11474 11504
01JUN1990 01JUL1990 01AUG1990 01SEP1990 01OCT1990 01NOV1990 01DEC1990 01JAN1991 01FEB1991 01MAR1991 01APR1991 01MAY1991 01JUN1991 01JUL1991
The appropriate format to use for SAS date or datetime valued ID variables depends on the sampling frequency or periodicity of the time series. Table 3.2 shows recommended formats for common data sampling frequencies and shows how the date ’17OCT1991’D or the datetime value ’17OCT1991:14:45:32’DT is displayed by these formats. Table 3.2
Formats for Different Sampling Frequencies
ID values SAS date
SAS datetime
Periodicity annual quarterly monthly weekly daily hourly minutes seconds
FORMAT YEAR4. YYQC6. MONYY7. WEEKDATX23. DATE9. DATETIME10. DATETIME13. DATETIME16.
Example 1991 1991:4 OCT1991 Thursday, 17 Oct 1991 17OCT1991 17OCT91:14 17OCT91:14:45 17OCT91:14:45:32
See Chapter 4, “Date Intervals, Formats, and Functions,” for more information about the date and datetime formats available.
The Variables DATE and DATETIME SAS/ETS procedures enable you to identify time series observations in many different ways to suit your needs. As discussed in preceding sections, you can use a combination of several ID variables, such as YEAR and MONTH for monthly data.
72 F Chapter 3: Working with Time Series Data
However, using a single SAS date or datetime ID variable is more convenient and enables you to take advantage of some features SAS/ETS procedures provide for processing ID variables. One such feature is automatic extrapolation of the ID variable to identify forecast observations. These features are discussed in following sections. Thus, it is a good practice to include a SAS date or datetime ID variable in all the time series SAS data sets you create. It is also a good practice to always give the date or datetime ID variable a format appropriate for the data periodicity. (For information about creating SAS date and datetime values from multiple ID variables, see the section “Computing Dates from Calendar Variables” on page 95.) You can assign a SAS date- or datetime-valued ID variable any name that conforms to SAS variable name requirements. However, you might find working with time series data in SAS easier and less confusing if you adopt the practice of always using the same name for the SAS date or datetime ID variable. This book always names the date- or datetime-values ID variable DATE if it contains SAS date values or DATETIME if it contains SAS datetime values. This makes it easy to recognize the ID variable and also makes it easy to recognize whether this ID variable uses SAS date or datetime values.
Sorting by Time Many SAS/ETS procedures assume the data are in chronological order. If the data are not in time order, you can use the SORT procedure to sort the data set. For example, proc sort data=a; by date; run;
There are many ways of coding the time ID variable or variables, and some ways do not sort correctly. If you use SAS date or datetime ID values as suggested in the preceding section, you do not need to be concerned with this issue. But if you encode date values in nonstandard ways, you need to consider whether your ID variables will sort. SAS date and datetime values always sort correctly, as do combinations of numeric variables such as YEAR, MONTH, and DAY used together. Julian dates also sort correctly. (Julian dates are numbers of the form yyddd, where yy is the year and ddd is the day of the year. For example, 17 October 1991 has the Julian date value 91290.) Calendar dates such as numeric values coded as mmddyy or ddmmyy do not sort correctly. Character variables that contain display values of dates, such as dates in the notation produced by SAS date formats, generally do not sort correctly.
Subsetting Data and Selecting Observations F 73
Subsetting Data and Selecting Observations It is often necessary to subset data for analysis. You might need to subset data to do the following: restrict the time range. For example, you want to perform a time series analysis using only recent data and ignoring observations from the distant past. select cross sections of the data. (See the section “Cross-Sectional Dimensions and BY Groups” on page 79.) For example, you have a data set with observations over time for each of several states, and you want to analyze the data for a single state. select particular kinds of time series from an interleaved-form data set. (See the section “Interleaved Time Series” on page 80.) For example, you have an output data set produced by the FORECAST procedure that contains both forecast and confidence limits observations, and you want to extract only the forecast observations. exclude particular observations. For example, you have an outlier in your time series, and you want to exclude this observation from the analysis. You can subset data either by using the DATA step to create a subset data set or by using a WHERE statement with the SAS procedure that analyzes the data. A typical WHERE statement used in a procedure has the following form: proc arima data=full; where '31dec1993'd < date < '26mar1994'd; identify var=close; run;
For complete reference documentation on the WHERE statement, see SAS Language Reference: Dictionary.
Subsetting SAS Data Sets To create a subset data set, specify the name of the subset data set in the DATA statement, bring in the full data set with a SET statement, and specify the subsetting criteria with either subsetting IF statements or WHERE statements. For example, suppose you have a data set that contains time series observations for each of several states. The following DATA step uses a WHERE statement to exclude observations with dates before 1970 and uses a subsetting IF statement to select observations for the state NC: data subset; set full; where date >= '1jan1970'd; if state = 'NC';
74 F Chapter 3: Working with Time Series Data
run;
In this case, it makes no difference logically whether the WHERE statement or the IF statement is used, and you can combine several conditions in one subsetting statement. The following statements produce the same results as the previous example: data subset; set full; if date >= '1jan1970'd & state = 'NC'; run;
The WHERE statement acts on the input data sets specified in the SET statement before observations are processed by the DATA step program, whereas the IF statement is executed as part of the DATA step program. If the input data set is indexed, using the WHERE statement can be more efficient than using the IF statement. However, the WHERE statement can refer only to variables in the input data set, not to variables computed by the DATA step program. To subset the variables of a data set, use KEEP or DROP statements or use KEEP= or DROP= data set options. See SAS Language Reference: Dictionary for information about KEEP and DROP statements and SAS data set options. For example, suppose you want to subset the data set as in the preceding example, but you want to include in the subset data set only the variables DATE, X, and Y. You could use the following statements: data subset; set full; if date >= '1jan1970'd & state = 'NC'; keep date x y; run;
Using the WHERE Statement with SAS Procedures Use the WHERE statement with SAS procedures to process only a subset of the input data set. For example, suppose you have a data set that contains monthly observations for each of several states, and you want to use the AUTOREG procedure to analyze data since 1970 for the state NC. You could use the following statements: proc autoreg data=full; where date >= '1jan1970'd & state = 'NC'; ... additional statements ... run;
You can specify any number of conditions in the WHERE statement. For example, suppose that a strike created an outlier in May 1975, and you want to exclude that observation. You could use the following statements: proc autoreg data=full;
Storing Time Series in a SAS Data Set F 75
where date >= '1jan1970'd & state = 'NC' & date ^= '1may1975'd; ... additional statements ... run;
Using SAS Data Set Options You can use the OBS= and FIRSTOBS= data set options to subset the input data set. For example, the following statements print observations 20 through 25 of the data set FULL: proc print data=full(firstobs=20 obs=25); run;
Figure 3.3 Partial Listing of Data Set FULL Obs
date
20 21 22 23 24 25
21OCT1993 22OCT1993 23OCT1993 24OCT1993 25OCT1993 26OCT1993
state NC NC NC NC NC NC
i 20 21 22 23 24 25
x 0.44803 0.03186 -0.25232 0.42524 0.05494 -0.29096
y 0.35302 1.67414 -1.61289 0.73112 -0.88664 -1.17275
close 0.44803 0.03186 -0.25232 0.42524 0.05494 -0.29096
You can use KEEP= and DROP= data set options to exclude variables from the input data set. See SAS Language Reference: Dictionary for information about SAS data set options.
Storing Time Series in a SAS Data Set This section discusses aspects of storing time series in SAS data sets. The topics discussed are the standard form of a time series data set, storing several series with different time ranges in the same data set, omitted observations, cross-sectional dimensions and BY groups, and interleaved time series. Any number of time series can be stored in a SAS data set. Normally, each time series is stored in a separate variable. For example, the following statements augment the USCPI data set read in the previous example with values for the producer price index: data usprice; input date : monyy7. cpi ppi; format date monyy7.; label cpi = "Consumer Price Index" ppi = "Producer Price Index"; datalines;
76 F Chapter 3: Working with Time Series Data
jun1990 129.9 114.3 jul1990 130.4 114.5 ... more lines ...
proc print data=usprice; run;
Figure 3.4 Time Series Data Set Containing Two Series Obs
date
1 2 3 4 5 6 7 8 9 10 11 12 13 14
JUN1990 JUL1990 AUG1990 SEP1990 OCT1990 NOV1990 DEC1990 JAN1991 FEB1991 MAR1991 APR1991 MAY1991 JUN1991 JUL1991
cpi
ppi
129.9 130.4 131.6 132.7 133.5 133.8 133.8 134.6 134.8 135.0 135.2 135.6 136.0 136.2
114.3 114.5 116.5 118.4 120.8 120.1 118.7 119.0 117.2 116.2 116.0 116.5 116.3 116.0
Standard Form of a Time Series Data Set The simple way the CPI and PPI time series are stored in the USPRICE data set in the preceding example is termed the standard form of a time series data set. A time series data set in standard form has the following characteristics: The data set contains one variable for each time series. The data set contains exactly one observation for each time period. The data set contains an ID variable or variables that identify the time period of each observation. The data set is sorted by the ID variables associated with date time values, so the observations are in time sequence. The data are equally spaced in time. That is, successive observations are a fixed time interval apart, so the data set can be described by a single sampling interval such as hourly, daily, monthly, quarterly, yearly, and so forth. This means that time series with different sampling frequencies are not mixed in the same SAS data set.
Several Series with Different Ranges F 77
Most SAS/ETS procedures that process time series expect the input data set to contain time series in this standard form, and this is the simplest way to store time series in SAS data sets. (The EXPAND and TIMESERIES procedures can be helpful in converting your data to this standard form.) There are more complex ways to represent time series in SAS data sets. You can incorporate cross-sectional dimensions with BY groups, so that each BY group is like a standard form time series data set. This method is discussed in the section “Cross-Sectional Dimensions and BY Groups” on page 79. You can interleave time series, with several observations for each time period identified by another ID variable. Interleaved time series data sets are used to store several series in the same SAS variable. Interleaved time series data sets are often used to store series of actual values, predicted values, and residuals, or series of forecast values and confidence limits for the forecasts. This is discussed in the section “Interleaved Time Series” on page 80.
Several Series with Different Ranges Different time series can have values recorded over different time ranges. Since a SAS data set must have the same observations for all variables, when time series with different ranges are stored in the same data set, missing values must be used for the periods in which a series is not available. Suppose that in the previous example you did not record values for CPI before August 1990 and did not record values for PPI after June 1991. The USPRICE data set could be read with the following statements: data usprice; input date : monyy7. cpi ppi; format date monyy7.; datalines; jun1990 . 114.3 jul1990 . 114.5 aug1990 131.6 116.5 sep1990 132.7 118.4 oct1990 133.5 120.8 nov1990 133.8 120.1 dec1990 133.8 118.7 jan1991 134.6 119.0 feb1991 134.8 117.2 mar1991 135.0 116.2 apr1991 135.2 116.0 may1991 135.6 116.5 jun1991 136.0 116.3 jul1991 136.2 . ;
The decimal points with no digits in the data records represent missing data and are read by SAS as missing value codes.
78 F Chapter 3: Working with Time Series Data
In this example, the time range of the USPRICE data set is June 1990 through July 1991, but the time range of the CPI variable is August 1990 through July 1991, and the time range of the PPI variable is June 1990 through June 1991. SAS/ETS procedures ignore missing values at the beginning or end of a series. That is, the series is considered to begin with the first nonmissing value and end with the last nonmissing value.
Missing Values and Omitted Observations Missing data can also occur within a series. Missing values that appear after the beginning of a time series and before the end of the time series are called embedded missing values. Suppose that in the preceding example you did not record values for CPI for November 1990 and did not record values for PPI for both November 1990 and March 1991. The USPRICE data set could be read with the following statements: data usprice; input date : monyy. cpi ppi; format date monyy.; datalines; jun1990 . 114.3 jul1990 . 114.5 aug1990 131.6 116.5 sep1990 132.7 118.4 oct1990 133.5 120.8 nov1990 . . dec1990 133.8 118.7 jan1991 134.6 119.0 feb1991 134.8 117.2 mar1991 135.0 . apr1991 135.2 116.0 may1991 135.6 116.5 jun1991 136.0 116.3 jul1991 136.2 . ;
In this example, the series CPI has one embedded missing value, and the series PPI has two embedded missing values. The ranges of the two series are the same as before. Note that the observation for November 1990 has missing values for both CPI and PPI; there is no data for this period. This is an example of a missing observation. You might ask why the data record for this period is included in the example at all, since the data record contains no data. However, deleting the data record for November 1990 from the example would cause an omitted observation in the USPRICE data set. SAS/ETS procedures expect input data sets to contain observations for a contiguous time sequence. If you omit observations from a time series data set and then try to analyze the data set with SAS/ETS procedures, the omitted observations will cause errors. When all data are missing for a period, a missing observation should be included in the data set to preserve the time sequence of the series.
Cross-Sectional Dimensions and BY Groups F 79
If observations are omitted from the data set, the EXPAND procedure can be used to fill in the gaps with missing values (or to interpolate nonmissing values) for the time series variables and with the appropriate date or datetime values for the ID variable.
Cross-Sectional Dimensions and BY Groups Often, time series in a collection are related by a cross sectional dimension. For example, the national average U.S. consumer price index data shown in the previous example can be disaggregated to show price indexes for major cities. In this case, there are several related time series: CPI for New York, CPI for Chicago, CPI for Los Angeles, and so forth. When these time series are considered as one data set, the city whose price level is measured is a cross sectional dimension of the data. There are two basic ways to store such related time series in a SAS data set. The first way is to use a standard form time series data set with a different variable for each series. For example, the following statements read CPI series for three major U.S. cities: data citycpi; input date : monyy7. cpiny cpichi cpila; format date monyy7.; datalines; nov1989 133.200 126.700 130.000 dec1989 133.300 126.500 130.600 ... more lines ...
The second way is to store the data in a time series cross-sectional form. In this form, the series for all cross sections are stored in one variable and a cross section ID variable is used to identify observations for the different series. The observations are sorted by the cross section ID variable and by time within each cross section. The following statements indicate how to read the CPI series for U.S. cities in time series crosssectional form: data cpicity; length city $11; input city $11. date : monyy. cpi; format date monyy.; datalines; New York JAN1990 135.100 New York FEB1990 135.300 ... more lines ...
proc sort data=cpicity; by city date; run;
80 F Chapter 3: Working with Time Series Data
When processing a time series cross sectional form data set with most SAS/ETS procedures, use the cross section ID variable in a BY statement to process the time series separately. The data set must be sorted by the cross section ID variable and sorted by date within each cross section. The PROC SORT step in the preceding example ensures that the CPICITY data set is correctly sorted. When the cross section ID variable is used in a BY statement, each BY group in the data set is like a standard form time series data set. Thus, SAS/ETS procedures that expect a standard form time series data set can process time series cross sectional data sets when a BY statement is used, producing an independent analysis for each cross section. It is also possible to analyze time series cross-sectional data jointly. The PANEL procedure (and the older TSCSREG procedure) expects the input data to be in the time series cross-sectional form described here. See Chapter 19, “The PANEL Procedure,” for more information.
Interleaved Time Series Normally, a time series data set has only one observation for each time period, or one observation for each time period within a cross section for a time series cross-sectional-form data set. However, it is sometimes useful to store several related time series in the same variable when the different series do not correspond to levels of a cross-sectional dimension of the data. In this case, the different time series can be interleaved. An interleaved time series data set is similar to a time series cross-sectional data set, except that the observations are sorted differently and the ID variable that distinguishes the different time series does not represent a cross-sectional dimension. Some SAS/ETS procedures produce interleaved output data sets. The interleaved time series form is a convenient way to store procedure output when the results consist of several different kinds of series for each of several input series. (Interleaved time series are also easy to process with plotting procedures. See the section “Plotting Time Series” on page 86.) For example, the FORECAST procedure fits a model to each input time series and computes predicted values and residuals from the model. The FORECAST procedure then uses the model to compute forecast values beyond the range of the input data and also to compute upper and lower confidence limits for the forecast values. Thus, the output from PROC FORECAST consists of up to five related time series for each variable forecast. The five resulting time series for each input series are stored in a single output variable with the same name as the series that is being forecast. The observations for the five resulting series are identified by values of the variable _TYPE_. These observations are interleaved in the output data set with observations for the same date grouped together. The following statements show how to use PROC FORECAST to forecast the variable CPI in the USCPI data set. Figure 3.5 shows part of the output data set produced by PROC FORECAST and illustrates the interleaved structure of this data set. proc forecast data=uscpi interval=month lead=12 out=foreout outfull outresid; var cpi;
Interleaved Time Series F 81
id date; run; proc print data=foreout(obs=6); run;
Figure 3.5 Partial Listing of Output Data Set Produced by PROC FORECAST Obs
date
1 2 3 4 5 6
JUN1990 JUN1990 JUN1990 JUL1990 JUL1990 JUL1990
_TYPE_
_LEAD_
ACTUAL FORECAST RESIDUAL ACTUAL FORECAST RESIDUAL
0 0 0 0 0 0
cpi 129.900 130.817 -0.917 130.400 130.678 -0.278
Observations with _TYPE_=ACTUAL contain the values of CPI read from the input data set. Observations with _TYPE_=FORECAST contain one-step-ahead predicted values for observations with dates in the range of the input series and contain forecast values for observations for dates beyond the range of the input series. Observations with _TYPE_=RESIDUAL contain the difference between the actual and one-step-ahead predicted values. Observations with _TYPE_=U95 and _TYPE_=L95 contain the upper and lower bounds, respectively, of the 95% confidence interval for the forecasts.
Using Interleaved Data Sets as Input to SAS/ETS Procedures Interleaved time series data sets are not directly accepted as input by SAS/ETS procedures. However, it is easy to use a WHERE statement with any procedure to subset the input data and select one of the interleaved time series as the input. For example, to analyze the residual series contained in the PROC FORECAST output data set with another SAS/ETS procedure, include a WHERE _TYPE_=’RESIDUAL’ statement. The following statements perform a spectral analysis of the residuals produced by PROC FORECAST in the preceding example: proc spectra data=foreout out=spectout; var cpi; where _type_='RESIDUAL'; run;
Combined Cross Sections and Interleaved Time Series Data Sets Interleaved time series output data sets produced from BY-group processing of time series crosssectional input data sets have a complex structure that combines a cross-sectional dimension, a time dimension, and the values of the _TYPE_ variable. For example, consider the PROC FORECAST output data set produced by the following statements: title "FORECAST Output Data Set with BY Groups";
82 F Chapter 3: Working with Time Series Data
proc forecast data=cpicity interval=month method=expo lead=2 out=foreout outfull outresid; var cpi; id date; by city; run; proc print data=foreout(obs=6); run;
The output data set FOREOUT contains many different time series in the single variable CPI. (The first few observations of FOREOUT are shown in Figure 3.6.) BY groups that are identified by the variable CITY contain the result series for the different cities. Within each value of CITY, the actual, forecast, residual, and confidence limits series are stored in interleaved form, with the observations for the different series identified by the values of _TYPE_. Figure 3.6 Combined Cross Sections and Interleaved Time Series Data FORECAST Output Data Set with BY Groups Obs 1 2 3 4 5 6
city Chicago Chicago Chicago Chicago Chicago Chicago
date JAN90 JAN90 JAN90 FEB90 FEB90 FEB90
_TYPE_ ACTUAL FORECAST RESIDUAL ACTUAL FORECAST RESIDUAL
_LEAD_ 0 0 0 0 0 0
cpi 128.100 128.252 -0.152 129.200 128.896 0.304
Output Data Sets of SAS/ETS Procedures Some SAS/ETS procedures (such as PROC FORECAST) produce interleaved output data sets, and other SAS/ETS procedures produce standard form time series data sets. The form a procedure uses depends on whether the procedure is normally used to produce multiple result series for each of many input series in one step (as PROC FORECAST does). For example, the ARIMA procedure can output actual series, forecast series, residual series, and confidence limit series just as the FORECAST procedure does. The PROC ARIMA output data set uses the standard form because PROC ARIMA is designed for the detailed analysis of one series at a time and so forecasts only one series at a time. The following statements show the use of the ARIMA procedure to produce a forecast of the USCPI data set. Figure 3.7 shows part of the output data set that is produced by the ARIMA procedure’s FORECAST statement. (The printed output from PROC ARIMA is not shown.) Compare the PROC ARIMA output data set shown in Figure 3.7 with the PROC FORECAST output data set shown in Figure 3.6.
Output Data Sets of SAS/ETS Procedures F 83
title "PROC ARIMA Output Data Set"; proc arima data=uscpi; identify var=cpi(1); estimate q=1; forecast id=date interval=month lead=12 out=arimaout; run; proc print data=arimaout(obs=6); run;
Figure 3.7 Partial Listing of Output Data Set Produced by PROC ARIMA PROC ARIMA Output Data Set Obs
date
1 2 3 4 5 6
JUN1990 JUL1990 AUG1990 SEP1990 OCT1990 NOV1990
cpi 129.9 130.4 131.6 132.7 133.5 133.8
FORECAST . 130.368 130.881 132.354 133.306 134.046
STD
L95
U95
RESIDUAL
. 0.36160 0.36160 0.36160 0.36160 0.36160
. 129.660 130.172 131.645 132.597 133.337
. 131.077 131.590 133.063 134.015 134.754
. 0.03168 0.71909 0.34584 0.19421 -0.24552
The output data set produced by the ARIMA procedure’s FORECAST statement stores the actual values in a variable with the same name as the response series, stores the forecast series in a variable named FORECAST, stores the residuals in a variable named RESIDUAL, stores the 95% confidence limits in variables named L95 and U95, and stores the standard error of the forecast in the variable STD. This method of storing several different result series as a standard form time series data set is simple and convenient. However, it works well only for a single input series. The forecast of a single series can be stored in the variable FORECAST. But if two series are forecast, two different FORECAST variables are needed. The STATESPACE procedure handles this problem by generating forecast variable names FOR1, FOR2, and so forth. The SPECTRA procedure uses a similar method. Names such as FOR1, FOR2, RES1, RES2, and so forth require you to remember the order in which the input series are listed. This is why PROC FORECAST, which is designed to forecast a whole list of input series at once, stores its results in interleaved form. Other SAS/ETS procedures are often used for a single input series but can also be used to process several series in a single step. Thus, they are not clearly like PROC FORECAST nor clearly like PROC ARIMA in the number of input series they are designed to work with. These procedures use a third method for storing multiple result series in an output data set. These procedures store output time series in standard form (as PROC ARIMA does) but require an OUTPUT statement to give names to the result series.
84 F Chapter 3: Working with Time Series Data
Time Series Periodicity and Time Intervals A fundamental characteristic of time series data is how frequently the observations are spaced in time. How often the observations of a time series occur is called the sampling frequency or the periodicity of the series. For example, a time series with one observation each month has a monthly sampling frequency or monthly periodicity and so is called a monthly time series. In SAS, data periodicity is described by specifying periodic time intervals into which the dates of the observations fall. For example, the SAS time interval MONTH divides time into calendar months. Many SAS/ETS procedures enable you to specify the periodicity of the input data set with the INTERVAL= option. For example, specifying INTERVAL=MONTH indicates that the procedure should expect the ID variable to contain SAS date values, and that the date value for each observation should fall in a separate calendar month. The EXPAND procedure uses interval name values with the FROM= and TO= options to control the interpolation of time series from one periodicity to another. SAS also uses time intervals in several other ways. In addition to indicating the periodicity of time series data sets, time intervals are used with the interval functions INTNX and INTCK and for controlling the plot axis and reference lines for plots of data over time.
Specifying Time Intervals Intervals are specified in SAS by using interval names such as YEAR, QTR, MONTH, DAY, and so forth. Table 3.3 summarizes the basic types of intervals. Table 3.3
Basic Interval Types
Name YEAR SEMIYEAR QTR MONTH SEMIMONTH TENDAY WEEK WEEKDAY DAY HOUR MINUTE SECOND
Periodicity yearly semiannual quarterly monthly 1st and 16th of each month 1st, 11th, and 21st of each month weekly daily ignoring weekend days daily hourly every minute every second
Interval names can be abbreviated in various ways. For example, you could specify monthly intervals as MONTH, MONTHS, MONTHLY, or just MON. SAS accepts all these forms as equivalent.
Using Intervals with SAS/ETS Procedures F 85
Interval names can also be qualified with a multiplier to indicate multi-period intervals. For example, biennial intervals are specified as YEAR2. Interval names can also be qualified with a shift index to indicate intervals with different starting points. For example, fiscal years starting in July are specified as YEAR.7. Intervals are classified as either date or datetime intervals. Date intervals are used with SAS date values, while datetime intervals are used with SAS datetime values. The interval types YEAR, SEMIYEAR, QTR, MONTH, SEMIMONTH, TENDAY, WEEK, WEEKDAY, and DAY are date intervals. HOUR, MINUTE, and SECOND are datetime intervals. Date intervals can be turned into datetime intervals for use with datetime values by prefixing the interval name with ‘DT’. Thus DTMONTH intervals are like MONTH intervals but are used with datetime ID values instead of date ID values. See Chapter 4, “Date Intervals, Formats, and Functions,” for more information about specifying time intervals and for a detailed reference to the different kinds of intervals available.
Using Intervals with SAS/ETS Procedures SAS/ETS procedures use the date or datetime interval and the ID variable in the following ways: to validate the data periodicity. The ID variable is used to check the data and verify that successive observations have valid ID values that correspond to successive time intervals. to check for gaps in the input observations. For example, if INTERVAL=MONTH and an input observation for January 1990 is followed by an observation for April 1990, there is a gap in the input data with two omitted observations. to label forecast observations in the output data set. The values of the ID variable for the forecast observations after the end of the input data set are extrapolated according to the frequency specifications of the INTERVAL= option.
Time Intervals, the Time Series Forecasting System, and the Time Series Viewer Time intervals are used in the Time Series Forecasting System and Time Series Viewer to identify the number of seasonal cycles or seasonality associated with a DATE, DATETIME, or TIME ID variable. For example, monthly time series have a seasonality of 12 because there are 12 months in a year; quarterly time series have a seasonality of 4 because there are four quarters in a year. The seasonality is used to analyze seasonal properties of time series data and to estimate seasonal forecasting methods.
86 F Chapter 3: Working with Time Series Data
Plotting Time Series This section discusses SAS procedures that are available for plotting time series data, but it covers only certain aspects of the use of these procedures with time series data. The Time Series Viewer displays and analyzes time series plots for time series data sets that do not contain cross sections. See Chapter 39, “Getting Started with Time Series Forecasting.” The SGPLOT procedure produces high resolution color graphics plots. See the SAS/GRAPH: Statistical Graphics Procedures Guide and SAS/GRAPH: Reference for more information. The PLOT procedure and the TIMEPLOT procedure produce low-resolution line-printer type plots. See the Base SAS Procedures Guide for information about these procedures.
Using the Time Series Viewer The following command starts the Time Series Viewer to display the plot of CPI in the USCPI data set against DATE. (The USCPI data set was shown in the previous example; the time series used in the following example contains more observations than previously shown.) tsview data=uscpi var=cpi timeid=date
The TSVIEW DATA= option specifies the data set to be viewed; the VAR= option specifies the variable that contains the time series observations; the TIMEID= option specifies the time series ID variable. The Time Series Viewer can also be invoked by selecting SolutionsIAnalyzeITime Series Viewer from the menu in the SAS Display Manager.
Using PROC SGPLOT The following statements use the SGPLOT procedure to plot CPI in the USCPI data set against DATE. (The USCPI data set was shown in a previous example; the data set plotted in the following example contains more observations than shown previously.) title "Plot of USCPI Data"; proc sgplot data=uscpi; series x=date y=cpi / markers; run;
The plot is shown in Figure 3.8.
Using PROC SGPLOT F 87
Figure 3.8 Plot of Monthly CPI Over Time
Controlling the Time Axis: Tick Marks and Reference Lines It is possible to control the spacing of the tick marks on the time axis. The following statements use the XAXIS statement to tell PROC SGPLOT to mark the axis at the start of each quarter: proc sgplot data=uscpi; series x=date y=cpi / markers; format date yyqc.; xaxis values=('1jan90'd to '1jul91'd by qtr); run;
The plot is shown in Figure 3.9.
88 F Chapter 3: Working with Time Series Data
Figure 3.9 Plot of Monthly CPI Over Time
Overlay Plots of Different Variables You can plot two or more series stored in different variables on the same graph by specifying multiple plot requests in one SGPLOT statement. For example, the following statements plot the CPI, FORECAST, L95, and U95 variables produced by PROC ARIMA in a previous example. A reference line is drawn to mark the start of the forecast period. Quarterly tick marks with YYQC format date values are used. title "ARIMA Forecasts of CPI"; proc arima data=uscpi; identify var=cpi(1); estimate q=1; forecast id=date interval=month lead=12 out=arimaout; run; title "ARIMA forecasts of CPI"; proc sgplot data=arimaout noautolegend; scatter x=date y=cpi;
Using PROC SGPLOT F 89
scatter x=date y=forecast / markerattrs=(symbol=asterisk); scatter x=date y=l95 / markerattrs=(symbol=asterisk color=green); scatter x=date y=u95 / markerattrs=(symbol=asterisk color=green); format date yyqc4.; xaxis values=('1jan90'd to '1jul92'd by qtr); refline '15jul91'd / axis=x; run;
The plot is shown in Figure 3.10. Figure 3.10 Plot of ARIMA Forecast
Overlay Plots of Interleaved Series You can also plot several series on the same graph when the different series are stored in the same variable in interleaved form. Plot interleaved time series by using the values of the ID variable in GROUP= option to distinguish the different series. The following example plots the output data set produced by PROC FORECAST in a previous example. Since the residual series has a different scale than the other series, it is excluded from the plot with a WHERE statement.
90 F Chapter 3: Working with Time Series Data
The _TYPE_ variable is used in the PLOT statement to identify the different series and to select the SCATTER statements to use for each plot. title "Plot of Forecasts of USCPI Data"; proc forecast data=uscpi interval=month lead=12 out=foreout outfull outresid; var cpi; id date; run; proc sgplot data=foreout; where _type_ ^= 'RESIDUAL'; scatter x=date y=cpi / group=_type_ markerattrs=(symbol=asterisk); format date yyqc4.; xaxis values=('1jan90'd to '1jul92'd by qtr); refline '15jul91'd / axis=x; run;
The plot is shown in Figure 3.11. Figure 3.11 Plot of Forecast
Using PROC PLOT F 91
Residual Plots The following example plots the residuals series that was excluded from the plot in the previous example. The NEEDLE statement specifies a needle plot, so that each residual point is plotted as a vertical line showing deviation from zero. proc sgplot data=foreout; where _type_ = 'RESIDUAL'; needle x=date y=cpi / markers; format date yyqc4.; xaxis values=('1jan90'd to '1jul91'd by qtr); run;
The plot is shown in Figure 3.12. Figure 3.12 Plot of Residuals
Using PROC PLOT The following statements use the PLOT procedure in Base SAS to plot CPI in the USCPI data set against DATE. (The data set plotted contains more observations than shown in the previous examples.) The plotting character used is a plus sign (+).
92 F Chapter 3: Working with Time Series Data
title "Plot of USCPI Data"; proc plot data=uscpi; plot cpi * date = '+' / vaxis= 129 to 137 by 1; run;
The plot is shown in Figure 3.13. Figure 3.13 Plot of Monthly CPI Over Time Plot of USCPI Data Plot of cpi*date.
U S C o n s u m e r P r i c e I n d e x
Symbol used is '+'.
137 + | | + 136 + + | + | + 135 + + | + + | 134 + | + + + | 133 + | + | 132 + | + | 131 + | | + 130 + + | | 129 + --+-------------+-------------+-------------+-------------+-------------+MAY1990 AUG1990 DEC1990 MAR1991 JUN1991 OCT1991 date
Using PROC TIMEPLOT The TIMEPLOT procedure in Base SAS plots time series data vertically on the page instead of horizontally across the page as the PLOT procedure does. PROC TIMEPLOT can also print the data values as well as plot them. The following statements use the TIMEPLOT procedure to plot CPI in the USCPI data set. Only the last 14 observations are included in this example. The plot is shown in Figure 3.14.
Using PROC GPLOT F 93
title "Plot of USCPI Data"; proc timeplot data=uscpi; plot cpi; id date; where date >= '1jun90'd; run;
Figure 3.14 Output Produced by PROC TIMEPLOT Plot of USCPI Data date
JUN1990 JUL1990 AUG1990 SEP1990 OCT1990 NOV1990 DEC1990 JAN1991 FEB1991 MAR1991 APR1991 MAY1991 JUN1991 JUL1991
US Consumer Price Index 129.90 130.40 131.60 132.70 133.50 133.80 133.80 134.60 134.80 135.00 135.20 135.60 136.00 136.20
min max 129.9 136.2 *--------------------------------------------------* |c | | c | | c | | c | | c | | c | | c | | c | | c | | c | | c | | c | | c | | c| *--------------------------------------------------*
The TIMEPLOT procedure has several interesting features not discussed here. See “The TIMEPLOT Procedure” in the Base SAS Procedures Guide for more information.
Using PROC GPLOT The GPLOT procedure in SAS/GRAPH software can also be used to plot time series data, although the newer SGPLOT procedure is easier to use. The following is an example of how GPLOT can be used to produce a plot similar to the graph produced by PROC SGPLOT in the preceding section. title "Plot of USCPI Data"; proc gplot data=uscpi; symbol i=spline v=circle h=2; plot cpi * date; run;
The plot is shown in Figure 3.15.
94 F Chapter 3: Working with Time Series Data
Figure 3.15 Plot of Monthly CPI Over Time
For more information about the GPLOT procedure, see SAS/GRAPH: Reference.
Calendar and Time Functions Calendar and time functions convert calendar and time variables such as YEAR, MONTH, DAY, and HOUR, MINUTE, SECOND into SAS date or datetime values, and vice versa. The SAS calendar and time functions are DATEJUL, DATEPART, DAY, DHMS, HMS, HOUR, JULDATE, MDY, MINUTE, MONTH, QTR, SECOND, TIMEPART, WEEKDAY, YEAR, and YYQ. See SAS Language Reference: Dictionary for more details about these functions.
Computing Dates from Calendar Variables F 95
Computing Dates from Calendar Variables The MDY function converts MONTH, DAY, and YEAR values to a SAS date value. For example, MDY(2010,17,91) returns the SAS date value ’17OCT2010’D. The YYQ function computes the SAS date for the first day of a quarter. For example, YYQ(2010,4) returns the SAS date value ’1OCT2010’D. The DATEJUL function computes the SAS date for a Julian date. For example, DATEJUL(91290) returns the SAS date ’17OCT2010’D. The YYQ and MDY functions are useful for creating SAS date variables when the ID values recorded in the data are year and quarter; year and month; or year, month, and day. For example, the following statements read quarterly data from records in which dates are coded as separate year and quarter values. The YYQ function is used to compute the variable DATE. data usecon; input year qtr gnp; date = yyq( year, qtr ); format date yyqc.; datalines; 1990 1 5375.4 1990 2 5443.3 ... more lines ...
The monthly USCPI data shown in a previous example contained time ID values represented in the MONYY format. If the data records instead contain separate year and month values, the data can be read in and the DATE variable computed with the following statements: data uscpi; input month year cpi; date = mdy( month, 1, year ); format date monyy.; datalines; 6 90 129.9 7 90 130.4 ... more lines ...
Computing Calendar Variables from Dates The functions YEAR, MONTH, DAY, WEEKDAY, and JULDATE compute calendar variables from SAS date values.
96 F Chapter 3: Working with Time Series Data
Returning to the example of reading the USCPI data from records that contain date values represented in the MONYY format, you can find the month and year of each observation from the SAS dates of the observations by using the following statements. data uscpi; input date monyy7. cpi; format date monyy7.; year = year( date ); month = month( date ); datalines; jun1990 129.9 jul1990 130.4 ... more lines ...
Converting between Date, Datetime, and Time Values The DATEPART function computes the SAS date value for the date part of a SAS datetime value. The TIMEPART function computes the SAS time value for the time part of a SAS datetime value. The HMS function computes SAS time values from HOUR, MINUTE, and SECOND time variables. The DHMS function computes a SAS datetime value from a SAS date value and HOUR, MINUTE, and SECOND time variables. See the section “SAS Date, Time, and Datetime Functions” on page 147 for more information about these functions.
Computing Datetime Values To compute datetime ID values from calendar and time variables, first compute the date and then compute the datetime with DHMS. For example, suppose you read tri-hourly temperature data with time recorded as YEAR, MONTH, DAY, and HOUR. The following statements show how to compute the ID variable DATETIME: data weather; input year month day hour temp; datetime = dhms( mdy( month, day, year ), hour, 0, 0 ); format datetime datetime10.; datalines; 91 10 16 21 61 91 10 17 0 56 91 10 17 3 53 ... more lines ...
Computing Calendar and Time Variables F 97
Computing Calendar and Time Variables The functions HOUR, MINUTE, and SECOND compute time variables from SAS datetime values. The DATEPART function and the date-to-calendar variables functions can be combined to compute calendar variables from datetime values. For example, suppose the date and time of the tri-hourly temperature data in the preceding example were recorded as datetime values in the datetime format. The following statements show how to compute the YEAR, MONTH, DAY, and HOUR of each observation and include these variables in the SAS data set: data weather; input datetime : datetime13. temp; format datetime datetime10.; hour = hour( datetime ); date = datepart( datetime ); year = year( date ); month = month( date ); day = day( date ); datalines; 16oct91:21:00 61 17oct91:00:00 56 17oct91:03:00 53 ... more lines ...
Interval Functions INTNX and INTCK The SAS interval functions INTNX and INTCK perform calculations with date values, datetime values, and time intervals. They can be used for calendar calculations with SAS date values to increment date values or datetime values by intervals and to count time intervals between dates. The INTNX function increments dates by intervals. INTNX computes the date or datetime of the start of the interval a specified number of intervals from the interval that contains a given date or datetime value. The form of the INTNX function is INTNX ( interval, from, n < , alignment > ) ;
The arguments to the INTNX function are as follows: interval
is a character constant or variable that contains an interval name
98 F Chapter 3: Working with Time Series Data
from
is a SAS date value (for date intervals) or datetime value (for datetime intervals) n
is the number of intervals to increment from the interval that contains the from value alignment
controls the alignment of SAS dates, within the interval, used to identify output observations. Allowed values are BEGINNING, MIDDLE, END, and SAMEDAY. The number of intervals to increment, n, can be positive, negative, or zero. For example, the statement NEXTMON=INTNX(’MONTH’,DATE,1) assigns to the variable NEXTMON the date of the first day of the month following the month that contains the value of DATE. Thus INTNX(’MONTH’,’21OCT2007’D,1) returns the date 1 November 2007. The INTCK function counts the number of interval boundaries between two date values or between two datetime values. The form of the INTCK function is INTCK ( interval, from, to ) ;
The arguments of the INTCK function are as follows: interval
is a character constant or variable that contains an interval name from
is the starting date value (for date intervals) or datetime value (for datetime intervals) to
is the ending date value (for date intervals) or datetime value (for datetime intervals) For example, the statement NEWYEARS=INTCK(’YEAR’,DATE1,DATE2) assigns to the variable NEWYEARS the number of New Year’s Days between the two dates.
Incrementing Dates by Intervals Use the INTNX function to increment dates by intervals. For example, suppose you want to know the date of the start of the week that is six weeks from the week of 17 October 1991. The function INTNX(’WEEK’,’17OCT91’D,6) returns the SAS date value ’24NOV1991’D. One practical use of the INTNX function is to generate periodic date values. For example, suppose the monthly U.S. Consumer Price Index data in a previous example were recorded without any time identifier on the data records. Given that you know the first observation is for June 1990, the following statements use the INTNX function to compute the ID variable DATE for each observation: data uscpi;
Alignment of SAS Dates F 99
input cpi; date = intnx( 'month', '1jun1990'd, _n_-1 ); format date monyy7.; datalines; 129.9 130.4 ... more lines ...
The automatic variable _N_ counts the number of times the DATA step program has executed; in this case _N_ contains the observation number. Thus _N_–1 is the increment needed from the first observation date. Alternatively, you could increment from the month before the first observation, in which case the INTNX function in this example would be written INTNX(’MONTH’,’1MAY1990’D,_N_).
Alignment of SAS Dates Any date within the time interval that corresponds to an observation of a periodic time series can serve as an ID value for the observation. For example, the USCPI data in a previous example might have been recorded with dates at the 15th of each month. The person recording the data might reason that since the CPI values are monthly averages, midpoints of the months might be the appropriate ID values. However, as far as SAS/ETS procedures are concerned, what is important about monthly data is the month of each observation, not the exact date of the ID value. If you indicate that the data are monthly (with an INTERVAL=MONTH) option, SAS/ETS procedures ignore the day of the month in processing the ID variable. The MONYY format also ignores the day of the month. Thus, you could read in the monthly USCPI data with mid-month DATE values by using the following statements: data uscpi; input date : date9. cpi; format date monyy7.; datalines; 15jun1990 129.9 15jul1990 130.4 ... more lines ...
The results of using this version of the USCPI data set for analysis with SAS/ETS procedures would be the same as with first-of-month values for DATE. Although you can use any date within the interval as an ID value for the interval, you might find working with time series in SAS less confusing if you always use date ID values normalized to the start of the interval. For some applications it might be preferable to use end of period dates, such as 31Jan1994, 28Feb1994, 31Mar1994, . . . , 31Dec1994. For other applications, such as plotting time series, it might be more convenient to use interval midpoint dates to identify the observations.
100 F Chapter 3: Working with Time Series Data
(Some SAS/ETS procedures provide an ALIGN= option to control the alignment of dates for output time series observations. In addition, the INTNX library function supports an optional argument to specify the alignment of the returned date value.) To normalize date values to the start of intervals, use the INTNX function with a 0 increment. The INTNX function with an increment of 0 computes the date of the first day of the interval (or the first second of the interval for datetime values). For example, INTNX(’MONTH’,’17OCT1991’D,0,’BEG’) returns the date ’1OCT1991’D. The following statements show how the preceding example can be changed to normalize the midmonth DATE values to first-of-month and end-of-month values. For exposition, the first-of-month value is transformed back into a middle-of-month value. data uscpi; input date : date9. cpi; format date monyy7.; monthbeg = intnx( 'month', date, 0, 'beg' ); midmonth = intnx( 'month', monthbeg, 0, 'mid' ); monthend = intnx( 'month', date, 0, 'end' ); datalines; 15jun1990 129.9 15jul1990 130.4 ... more lines ...
If you want to compute the date of a particular day within an interval, you can use calendar functions, or you can increment the starting date of the interval by a number of days. The following example shows three ways to compute the seventh day of the month: data test; set uscpi; mon07_1 = mdy( month(date), 7, year(date) ); mon07_2 = intnx( 'month', date, 0, 'beg' ) + 6; mon07_3 = intnx( 'day', date, 6 ); run;
Computing the Width of a Time Interval To compute the width of a time interval, subtract the ID value of the start of the next interval from the ID value of the start of the current interval. If the ID values are SAS dates, the width is in days. If the ID values are SAS datetime values, the width is in seconds. For example, the following statements show how to add a variable WIDTH to the USCPI data set that contains the number of days in the month for each observation: data uscpi; input date : date9. cpi; format date monyy7.; width = intnx( 'month', date, 1 ) - intnx( 'month', date, 0 );
Computing the Ceiling of an Interval F 101
datalines; 15jun1990 129.9 15jul1990 130.4 15aug1990 131.6 ... more lines ...
Computing the Ceiling of an Interval To shift a date to the start of the next interval if it is not already at the start of an interval, subtract 1 from the date and use INTNX to increment the date by 1 interval. For example, the following statements add the variable NEWYEAR to the monthly USCPI data set. The variable NEWYEAR contains the date of the next New Year’s Day. NEWYEAR contains the same value as DATE when the DATE value is the start of year and otherwise contains the date of the start of the next year. data test; set uscpi; newyear = intnx( 'year', date - 1, 1 ); format newyear date.; run;
Counting Time Intervals Use the INTCK function to count the number of interval boundaries between two dates. Note that the INTCK function counts the number of times the beginning of an interval is reached in moving from the first date to the second. It does not count the number of complete intervals between two dates. Following are two examples: The function INTCK(’MONTH’,’1JAN1991’D,’31JAN1991’D) returns 0, since the two dates are within the same month. The function INTCK(’MONTH’,’31JAN1991’D,’1FEB1991’D) returns 1, since the two dates lie in different months that are one month apart. When the first date is later than the second date, INTCK returns a negative count. For example, the function INTCK(’MONTH’,’1FEB1991’D,’31JAN1991’D) returns –1. The following example shows how to use the INTCK function with shifted interval specifications to count the number of Sundays, Mondays, Tuesdays, and so forth, in each month. The variables NSUNDAY, NMONDAY, NTUESDAY, and so forth, are added to the USCPI data set.
102 F Chapter 3: Working with Time Series Data
data uscpi; set uscpi; d0 = intnx( 'month', date, 0 ) - 1; d1 = intnx( 'month', date, 1 ) - 1; nSunday = intck( 'week.1', d0, d1 ); nMonday = intck( 'week.2', d0, d1 ); nTuesday = intck( 'week.3', d0, d1 ); nWedday = intck( 'week.4', d0, d1 ); nThurday = intck( 'week.5', d0, d1 ); nFriday = intck( 'week.6', d0, d1 ); nSatday = intck( 'week.7', d0, d1 ); drop d0 d1; run;
Since the INTCK function counts the number of interval beginning dates between two dates, the number of Sundays is computed by counting the number of week boundaries between the last day of the previous month and the last day of the current month. To count Mondays, Tuesdays, and so forth, shifted week intervals are used. The interval type WEEK.2 specifies weekly intervals starting on Mondays, WEEK.3 specifies weeks starting on Tuesdays, and so forth.
Checking Data Periodicity Suppose you have a time series data set and you want to verify that the data periodicity is correct, the observations are dated correctly, and the data set is sorted by date. You can use the INTCK function to compare the date of the current observation with the date of the previous observation and verify that the dates fall into consecutive time intervals. For example, the following statements verify that the data set USCPI is a correctly dated monthly data set. The RETAIN statement is used to hold the date of the previous observation, and the automatic variable _N_ is used to start the verification process with the second observation. data _null_; set uscpi; retain prevdate; if _n_ > 1 then if intck( 'month', prevdate, date ) ^= 1 then put "Bad date sequence at observation number " _n_; prevdate = date; run;
Filling In Omitted Observations in a Time Series Data Set Most SAS/ETS procedures expect input data to be in the standard form, with no omitted observations in the sequence of time periods. When data are missing for a time period, the data set should contain a missing observation, in which all variables except the ID variables have missing values.
Using Interval Functions for Calendar Calculations F 103
You can replace omitted observations in a time series data set with missing observations with the EXPAND procedure. The following statements create a monthly data set, OMITTED, from data lines that contain records for an intermittent sample of months. (Data values are not shown.) The OMITTED data set is sorted to make sure it is in time order. data omitted; input date : monyy7. x y z; format date monyy7.; datalines; jan1991 ... mar1991 ... apr1991 ... jun1991 ... ... etc. ... ; proc sort data=omitted; by date; run;
This data set is converted to a standard form time series data set by the following PROC EXPAND step. The TO= option specifies that monthly data is to be output, while the METHOD=NONE option specifies that no interpolation is to be performed, so that the variables X, Y, and Z in the output data set STANDARD will have missing values for the omitted time periods that are filled in by the EXPAND procedure. proc expand data=omitted out=standard to=month method=none; id date; run;
Using Interval Functions for Calendar Calculations With a little thought, you can come up with a formula that involves INTNX and INTCK functions and different interval types to perform almost any calendar calculation. For example, suppose you want to know the date of the third Wednesday in the month of October 1991. The answer can be computed as intnx( 'week.4', '1oct91'd - 1, 3 )
which returns the SAS date value ’16OCT91’D.
104 F Chapter 3: Working with Time Series Data
Consider this more complex example: how many weekdays are there between 17 October 1991 and the second Friday in November 1991, inclusive? The following formula computes the number of weekdays between the date value contained in the variable DATE and the second Friday of the following month (including the ending dates of this period): n = intck( 'weekday', date - 1, intnx( 'week.6', intnx( 'month', date, 1 ) - 1, 2 ) + 1 );
Setting DATE to ’17OCT91’D and applying this formula produces the answer, N=17.
Lags, Leads, Differences, and Summations When working with time series data, you sometimes need to refer to the values of a series in previous or future periods. For example, the usual interest in the consumer price index series shown in previous examples is how fast the index is changing, rather than the actual level of the index. To compute a percent change, you need both the current and the previous values of the series. When you model a time series, you might want to use the previous values of other series as explanatory variables. This section discusses how to use the DATA step to perform operations over time: lags, differences, leads, summations over time, and percent changes. The EXPAND procedure can also be used to perform many of these operations; see Chapter 14, “The EXPAND Procedure,” for more information. See also the section “Transforming Time Series” on page 113.
The LAG and DIF Functions The DATA step provides two functions, LAG and DIF, for accessing previous values of a variable or expression. These functions are useful for computing lags and differences of series. For example, the following statements add the variables CPILAG and CPIDIF to the USCPI data set. The variable CPILAG contains lagged values of the CPI series. The variable CPIDIF contains the changes of the CPI series from the previous period; that is, CPIDIF is CPI minus CPILAG. The new data set is shown in part in Figure 3.16. data uscpi; set uscpi; cpilag = lag( cpi ); cpidif = dif( cpi ); run; proc print data=uscpi;
The LAG and DIF Functions F 105
run;
Figure 3.16 USCPI Data Set with Lagged and Differenced Series Plot of USCPI Data Obs
date
1 2 3 4 5 6 7 8 9 10 11 12 13 14
JUN1990 JUL1990 AUG1990 SEP1990 OCT1990 NOV1990 DEC1990 JAN1991 FEB1991 MAR1991 APR1991 MAY1991 JUN1991 JUL1991
cpi 129.9 130.4 131.6 132.7 133.5 133.8 133.8 134.6 134.8 135.0 135.2 135.6 136.0 136.2
cpilag . 129.9 130.4 131.6 132.7 133.5 133.8 133.8 134.6 134.8 135.0 135.2 135.6 136.0
cpidif . 0.5 1.2 1.1 0.8 0.3 0.0 0.8 0.2 0.2 0.2 0.4 0.4 0.2
Understanding the DATA Step LAG and DIF Functions When used in this simple way, LAG and DIF act as lag and difference functions. However, it is important to keep in mind that, despite their names, the LAG and DIF functions available in the DATA step are not true lag and difference functions. Rather, LAG and DIF are queuing functions that remember and return argument values from previous calls. The LAG function remembers the value you pass to it and returns as its result the value you passed to it on the previous call. The DIF function works the same way but returns the difference between the current argument and the remembered value. (LAG and DIF return a missing value the first time the function is called.) A true lag function does not return the value of the argument for the “previous call,” as do the DATA step LAG and DIF functions. Instead, a true lag function returns the value of its argument for the “previous observation,” regardless of the sequence of previous calls to the function. Thus, for a true lag function to be possible, it must be clear what the “previous observation” is. If the data are sorted chronologically, then LAG and DIF act as true lag and difference functions. If in doubt, use PROC SORT to sort your data before using the LAG and DIF functions. Beware of missing observations, which can cause LAG and DIF to return values that are not the actual lag and difference values. The DATA step is a powerful tool that can read any number of observations from any number of input files or data sets, can create any number of output data sets, and can write any number of output observations to any of the output data sets, all in the same program. Thus, in general, it is not clear what “previous observation” means in a DATA step program. In a DATA step program, the “previous observation” exists only if you write the program in a simple way that makes this concept meaningful.
106 F Chapter 3: Working with Time Series Data
Since, in general, the previous observation is not clearly defined, it is not possible to make true lag or difference functions for the DATA step. Instead, the DATA step provides queuing functions that make it easy to compute lags and differences.
Pitfalls of DATA Step LAG and DIF Functions The LAG and DIF functions compute lags and differences provided that the sequence of calls to the function corresponds to the sequence of observations in the output data set. However, any complexity in the DATA step that breaks this correspondence causes the LAG and DIF functions to produce unexpected results. For example, suppose you want to add the variable CPILAG to the USCPI data set, as in the previous example, and you also want to subset the series to 1991 and later years. You might use the following statements: data subset; set uscpi; if date >= '1jan1991'd; cpilag = lag( cpi ); /* WRONG PLACEMENT! */ run;
If the subsetting IF statement comes before the LAG function call, the value of CPILAG will be missing for January 1991, even though a value for December 1990 is available in the USCPI data set. To avoid losing this value, you must rearrange the statements to ensure that the LAG function is actually executed for the December 1990 observation. data subset; set uscpi; cpilag = lag( cpi ); if date >= '1jan1991'd; run;
In other cases, the subsetting statement should come before the LAG and DIF functions. For example, the following statements subset the FOREOUT data set shown in a previous example to select only _TYPE_=RESIDUAL observations and also to compute the variable LAGRESID: data residual; set foreout; if _type_ = "RESIDUAL"; lagresid = lag( cpi ); run;
Another pitfall of LAG and DIF functions arises when they are used to process time series crosssectional data sets. For example, suppose you want to add the variable CPILAG to the CPICITY data set shown in a previous example. You might use the following statements: data cpicity; set cpicity; cpilag = lag( cpi ); run;
The LAG and DIF Functions F 107
However, these statements do not yield the desired result. In the data set produced by these statements, the value of CPILAG for the first observation for the first city is missing (as it should be), but in the first observation for all later cities, CPILAG contains the last value for the previous city. To correct this, set the lagged variable to missing at the start of each cross section, as follows: data cpicity; set cpicity; by city date; cpilag = lag( cpi ); if first.city then cpilag = .; run;
Alternatives to LAG and DIF Functions You can also use the EXPAND procedure to compute lags and differences. For example, the following statements compute lag and difference variables for CPI: proc expand data=uscpi out=uscpi method=none; id date; convert cpi=cpilag / transform=( lag 1 ); convert cpi=cpidif / transform=( dif 1 ); run;
You can also calculate lags and differences in the DATA step without using LAG and DIF functions. For example, the following statements add the variables CPILAG and CPIDIF to the USCPI data set: data uscpi; set uscpi; retain cpilag; cpidif = cpi - cpilag; output; cpilag = cpi; run;
The RETAIN statement prevents the DATA step from reinitializing CPILAG to a missing value at the start of each iteration and thus allows CPILAG to retain the value of CPI assigned to it in the last statement. The OUTPUT statement causes the output observation to contain values of the variables before CPILAG is reassigned the current value of CPI in the last statement. This is the approach that must be used if you want to build a variable that is a function of its previous lags.
LAG and DIF Functions in PROC MODEL The preceding discussion of LAG and DIF functions applies to LAG and DIF functions available in the DATA step. However, LAG and DIF functions are also used in the MODEL procedure. The MODEL procedure LAG and DIF functions do not work like the DATA step LAG and DIF functions. The LAG and DIF functions supported by PROC MODEL are true lag and difference functions, not queuing functions.
108 F Chapter 3: Working with Time Series Data
Unlike the DATA step, the MODEL procedure processes observations from a single input data set, so the “previous observation” is always clearly defined in a PROC MODEL program. Therefore, PROC MODEL is able to define LAG and DIF as true lagging functions that operate on values from the previous observation. See Chapter 18, “The MODEL Procedure,” for more information about LAG and DIF functions in the MODEL procedure.
Multiperiod Lags and Higher-Order Differencing To compute lags at a lagging period greater than 1, add the lag length to the end of the LAG keyword to specify the lagging function needed. For example, the LAG2 function returns the value of its argument two calls ago, the LAG3 function returns the value of its argument three calls ago, and so forth. To compute differences at a lagging period greater than 1, add the lag length to the end of the DIF keyword. For example, the DIF2 function computes the differences between the value of its argument and the value of its argument two calls ago. (The maximum lagging period is 100.) The following statements add the variables CPILAG12 and CPIDIF12 to the USCPI data set. CPILAG12 contains the value of CPI from the same month one year ago. CPIDIF12 contains the change in CPI from the same month one year ago. (In this case, the first 12 values of CPILAG12 and CPIDIF12 are missing.) data uscpi; set uscpi; cpilag12 = lag12( cpi ); cpidif12 = dif12( cpi ); run;
To compute second differences, take the difference of the difference. To compute higher-order differences, nest DIF functions to the order needed. For example, the following statements compute the second difference of CPI: data uscpi; set uscpi; cpi2dif = dif( dif( cpi ) ); run;
Multiperiod lags and higher-order differencing can be combined. For example, the following statements compute monthly changes in the inflation rate, with inflation rate computed as percent change in CPI from the same month one year ago: data uscpi; set uscpi; infchng = dif( 100 * dif12( cpi ) / lag12( cpi ) ); run;
Percent Change Calculations F 109
Percent Change Calculations There are several common ways to compute the percent change in a time series. This section illustrates the use of LAG and DIF functions by showing SAS statements for various kinds of percent change calculations.
Computing Period-to-Period Change To compute percent change from the previous period, divide the difference of the series by the lagged value of the series and multiply by 100. data uscpi; set uscpi; pctchng = dif( cpi ) / lag( cpi ) * 100; label pctchng = "Monthly Percent Change, At Monthly Rates"; run;
Often, changes from the previous period are expressed at annual rates. This is done by exponentiation of the current-to-previous period ratio to the number of periods in a year and expressing the result as a percent change. For example, the following statements compute the month-over-month change in CPI as a percent change at annual rates: data uscpi; set uscpi; pctchng = ( ( cpi / lag( cpi ) ) ** 12 - 1 ) * 100; label pctchng = "Monthly Percent Change, At Annual Rates"; run;
Computing Year-over-Year Change To compute percent change from the same period in the previous year, use LAG and DIF functions with a lagging period equal to the number of periods in a year. (For quarterly data, use LAG4 and DIF4. For monthly data, use LAG12 and DIF12.) For example, the following statements compute monthly percent change in CPI from the same month one year ago: data uscpi; set uscpi; pctchng = dif12( cpi ) / lag12( cpi ) * 100; label pctchng = "Percent Change from One Year Ago"; run;
To compute year-over-year percent change measured at a given period within the year, subset the series of percent changes from the same period in the previous year to form a yearly data set. Use an IF or WHERE statement to select observations for the period within each year on which the year-over-year changes are based.
110 F Chapter 3: Working with Time Series Data
For example, the following statements compute year-over-year percent change in CPI from December of the previous year to December of the current year: data annual; set uscpi; pctchng = dif12( cpi ) / lag12( cpi ) * 100; label pctchng = "Percent Change: December to December"; if month( date ) = 12; format date year4.; run;
Computing Percent Change in Yearly Averages To compute changes in yearly averages, first aggregate the series to an annual series by using the EXPAND procedure, and then compute the percent change of the annual series. (See Chapter 14, “The EXPAND Procedure,” for more information about PROC EXPAND.) For example, the following statements compute percent changes in the annual averages of CPI: proc expand data=uscpi out=annual from=month to=year; convert cpi / observed=average method=aggregate; run; data annual; set annual; pctchng = dif( cpi ) / lag( cpi ) * 100; label pctchng = "Percent Change in Yearly Averages"; run;
It is also possible to compute percent change in the average over the most recent yearly span. For example, the following statements compute monthly percent change in the average of CPI over the most recent 12 months from the average over the previous 12 months: data uscpi; retain sum12 0; drop sum12 ave12 cpilag12; set uscpi; sum12 = sum12 + cpi; cpilag12 = lag12( cpi ); if cpilag12 ^= . then sum12 = sum12 - cpilag12; if lag11( cpi ) ^= . then ave12 = sum12 / 12; pctchng = dif12( ave12 ) / lag12( ave12 ) * 100; label pctchng = "Percent Change in 12 Month Moving Ave."; run;
This example is a complex use of LAG and DIF functions that requires care in handling the initialization of the moving-window averaging process. The LAG12 of CPI is checked for missing values to determine when more than 12 values have been accumulated, and older values must be removed from the moving sum. The LAG11 of CPI is checked for missing values to determine when at least 12 values have been accumulated; AVE12 will be missing when LAG11 of CPI is missing. The DROP statement prevents temporary variables from being added to the data set.
Leading Series F 111
Note that the DIF and LAG functions must execute for every observation, or the queues of remembered values will not operate correctly. The CPILAG12 calculation must be separate from the IF statement. The PCTCHNG calculation must not be conditional on the IF statement. The EXPAND procedure provides an alternative way to compute moving averages.
Leading Series Although the SAS System does not provide a function to look ahead at the “next” value of a series, there are a couple of ways to perform this task. The most direct way to compute leads is to use the EXPAND procedure. For example: proc expand data=uscpi out=uscpi method=none; id date; convert cpi=cpilead1 / transform=( lead 1 ); convert cpi=cpilead2 / transform=( lead 2 ); run;
Another way to compute lead series in SAS software is by lagging the time ID variable, renaming the series, and merging the result data set back with the original data set. For example, the following statements add the variable CPILEAD to the USCPI data set. The variable CPILEAD contains the value of CPI in the following month. (The value of CPILEAD is missing for the last observation, of course.) data temp; set uscpi; keep date cpi; rename cpi = cpilead; date = lag( date ); if date ^= .; run; data uscpi; merge uscpi temp; by date; run;
To compute leads at different lead lengths, you must create one temporary data set for each lead length. For example, the following statements compute CPILEAD1 and CPILEAD2, which contain leads of CPI for 1 and 2 periods, respectively: data temp1(rename=(cpi=cpilead1)) temp2(rename=(cpi=cpilead2)); set uscpi; keep date cpi; date = lag( date ); if date ^= . then output temp1; date = lag( date );
112 F Chapter 3: Working with Time Series Data
if date ^= . then output temp2; run; data uscpi; merge uscpi temp1 temp2; by date; run;
Summing Series Simple cumulative sums are easy to compute using SAS sum statements. The following statements show how to compute the running sum of variable X in data set A, adding XSUM to the data set. data a; set a; xsum + x; run;
The SAS sum statement automatically retains the variable XSUM and initializes it to 0, and the sum statement treats missing values as 0. The sum statement is equivalent to using a RETAIN statement and the SUM function. The previous example could also be written as follows: data a; set a; retain xsum; xsum = sum( xsum, x ); run;
You can also use the EXPAND procedure to compute summations. For example: proc expand data=a out=a method=none; convert x=xsum / transform=( sum ); run;
Like differencing, summation can be done at different lags and can be repeated to produce higherorder sums. To compute sums over observations separated by lags greater than 1, use the LAG and SUM functions together, and use a RETAIN statement that initializes the summation variable to zero. For example, the following statements add the variable XSUM2 to data set A. XSUM2 contains the sum of every other observation, with even-numbered observations containing a cumulative sum of values of X from even observations, and odd-numbered observations containing a cumulative sum of values of X from odd observations. data a; set a;
Transforming Time Series F 113
retain xsum2 0; xsum2 = sum( lag( xsum2 ), x ); run;
Assuming that A is a quarterly data set, the following statements compute running sums of X for each quarter. XSUM4 contains the cumulative sum of X for all observations for the same quarter as the current quarter. Thus, for a first-quarter observation, XSUM4 contains a cumulative sum of current and past first-quarter values. data a; set a; retain xsum4 0; xsum4 = sum( lag3( xsum4 ), x ); run;
To compute higher-order sums, repeat the preceding process and sum the summation variable. For example, the following statements compute the first and second summations of X: data a; set a; xsum + x; x2sum + xsum; run;
The following statements compute the second order four-period sum of X: data a; set a; retain xsum4 x2sum4 0; xsum4 = sum( lag3( xsum4 ), x ); x2sum4 = sum( lag3( x2sum4 ), xsum4 ); run;
You can also use PROC EXPAND to compute cumulative statistics and moving window statistics. See Chapter 14, “The EXPAND Procedure,” for details.
Transforming Time Series It is often useful to transform time series for analysis or forecasting. Many time series analysis and forecasting methods are most appropriate for time series with an unrestricted range, a linear trend, and a constant variance. Series that do not conform to these assumptions can often be transformed to series for which the methods are appropriate. Transformations can be useful for the following:
114 F Chapter 3: Working with Time Series Data
range restrictions. Many time series cannot have negative values or can be limited to a maximum possible value. You can often create a transformed series with an unbounded range. nonlinear trends. Many economic time series grow exponentially. Exponential growth corresponds to linear growth in the logarithms of the series. series variability that changes over time. Various transformations can be used to stabilize the variance. nonstationarity. The %DFTEST macro can be used to test a series for nonstationarity which can then be removed by differencing.
Log Transformation The logarithmic transformation is often useful for series that must be greater than zero and that grow exponentially. For example, Figure 3.17 shows a plot of an airline passenger miles series. Notice that the series has exponential growth and the variability of the series increases over time. Airline passenger miles must also be zero or greater. Figure 3.17 Airline Series
Other Transformations F 115
The following statements compute the logarithms of the airline series: data lair; set sashelp.air; logair = log( air ); run;
Figure 3.18 shows a plot of the log-transformed airline series. Notice that the log series has a linear trend and constant variance. Figure 3.18 Log Airline Series
The %LOGTEST macro can help you decide if a log transformation is appropriate for a series. See Chapter 5, “SAS Macros and Functions,” for more information about the %LOGTEST macro.
Other Transformations The Box-Cox transformation is a general class of transformations that includes the logarithm as a special case. The %BOXCOXAR macro can be used to find an optimal Box-Cox transformation for a time series. See Chapter 5 for more information about the %BOXCOXAR macro.
116 F Chapter 3: Working with Time Series Data
The logistic transformation is useful for variables with both an upper and a lower bound, such as market shares. The logistic transformation is useful for proportions, percent values, relative frequencies, or probabilities. The logistic function transforms values between 0 and 1 to values that can range from -1 to +1. For example, the following statements transform the variable SHARE from percent values to an unbounded range: data a; set a; lshare = log( share / ( 100 - share ) ); run;
Many other data transformation can be used. You can create virtually any desired data transformation using DATA step statements.
The EXPAND Procedure and Data Transformations The EXPAND procedure provides a convenient way to transform series. For example, the following statements add variables for the logarithm of AIR and the logistic of SHARE to data set A: proc expand data=a out=a method=none; convert air=logair / transform=( log ); convert share=lshare / transform=( / 100 logit ); run;
See Table 14.2 in Chapter 14, “The EXPAND Procedure,” for a complete list of transformations supported by PROC EXPAND.
Manipulating Time Series Data Sets This section discusses merging, splitting, and transposing time series data sets and interpolating time series data to a higher or lower sampling frequency.
Splitting and Merging Data Sets In some cases, you might want to separate several time series that are contained in one data set into different data sets. In other cases, you might want to combine time series from different data sets into one data set.
Transposing Data Sets F 117
To split a time series data set into two or more data sets that contain subsets of the series, use a DATA step to create the new data sets and use the KEEP= data set option to control which series are included in each new data set. The following statements split the USPRICE data set shown in a previous example into two data sets, USCPI and USPPI: data uscpi(keep=date cpi) usppi(keep=date ppi); set usprice; run;
If the series have different time ranges, you can subset the time ranges of the output data sets accordingly. For example, if you know that CPI in USPRICE has the range August 1990 through the end of the data set, while PPI has the range from the beginning of the data set through June 1991, you could write the previous example as follows: data uscpi(keep=date cpi) usppi(keep=date ppi); set usprice; if date >= '1aug1990'd then output uscpi; if date '1may1991'd & date < '1oct1991'd; run; proc transpose data=foreout out=trans(drop=_name_); var cpi; id _type_; by date; where date > '1may1991'd & date < '1oct1991'd; run; title "Transposed Data Set"; proc print data=trans(obs=10); run;
The TRANSPOSE procedure adds the variables _NAME_ and _LABEL_ to the output data set. These variables contain the names and labels of the variables that were transposed. In this example, there is only one transposed variable, so _NAME_ has the value CPI for all observations. Thus, _NAME_ and _LABEL_ are of no interest and are dropped from the output data set by using the DROP= data set option. (If none of the variables transposed have a label, PROC TRANSPOSE does not output the _LABEL_ variable and the DROP=_LABEL_ option produces a warning message. You can ignore this message, or you can prevent the message by omitting _LABEL_ from the DROP= list.) The original and transposed data sets are shown in Figure 3.19 and Figure 3.20. (The observation numbers shown for the original data set reflect the operation of the WHERE statement.) Figure 3.19 Original Data Sets Original Data Set Obs
date
37 38 39 40 41 42 43 44 45 46
JUN1991 JUN1991 JUN1991 JUL1991 JUL1991 JUL1991 AUG1991 AUG1991 AUG1991 SEP1991
_TYPE_ ACTUAL FORECAST RESIDUAL ACTUAL FORECAST RESIDUAL FORECAST L95 U95 FORECAST
_LEAD_ 0 0 0 0 0 0 1 1 1 2
cpi 136.000 136.146 -0.146 136.200 136.566 -0.366 136.856 135.723 137.990 137.443
Transposing Data Sets F 119
Figure 3.20 Transposed Data Sets Transposed Data Set Obs 1 2 3 4
date JUN1991 JUL1991 AUG1991 SEP1991
_LABEL_ US US US US
Consumer Consumer Consumer Consumer
Price Price Price Price
ACTUAL FORECAST RESIDUAL Index Index Index Index
136.0 136.2 . .
L95
U95
136.146 -0.14616 . . 136.566 -0.36635 . . 136.856 . 135.723 137.990 137.443 . 136.126 138.761
Transposing Cross-Sectional Dimensions The following statements transpose the variable CPI in the CPICITY data set shown in a previous example from time series cross-sectional form to a standard form time series data set. (Only a subset of the data shown in the previous example is used here.) Note that the method shown in this example works only for a single variable. title "Original Data Set"; proc print data=cpicity; run; proc sort data=cpicity out=temp; by date city; run; proc transpose data=temp out=citycpi(drop=_name_); var cpi; id city; by date; run; title "Transposed Data Set"; proc print data=citycpi; run;
The names of the variables in the transposed data sets are taken from the city names in the ID variable CITY. The original and the transposed data sets are shown in Figure 3.21 and Figure 3.22.
120 F Chapter 3: Working with Time Series Data
Figure 3.21 Original Data Sets Transposed Data Set Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
city
date
cpi
cpilag
Chicago Chicago Chicago Chicago Chicago Chicago Chicago Los Angeles Los Angeles Los Angeles Los Angeles Los Angeles Los Angeles Los Angeles New York New York New York New York New York New York New York
JAN90 FEB90 MAR90 APR90 MAY90 JUN90 JUL90 JAN90 FEB90 MAR90 APR90 MAY90 JUN90 JUL90 JAN90 FEB90 MAR90 APR90 MAY90 JUN90 JUL90
128.1 129.2 129.5 130.4 130.4 131.7 132.0 132.1 133.6 134.5 134.2 134.6 135.0 135.6 135.1 135.3 136.6 137.3 137.2 137.1 138.4
. 128.1 129.2 129.5 130.4 130.4 131.7 . 132.1 133.6 134.5 134.2 134.6 135.0 . 135.1 135.3 136.6 137.3 137.2 137.1
Figure 3.22 Transposed Data Sets Transposed Data Set
Obs 1 2 3 4 5 6 7
date JAN90 FEB90 MAR90 APR90 MAY90 JUN90 JUL90
Chicago
Los_ Angeles
128.1 129.2 129.5 130.4 130.4 131.7 132.0
132.1 133.6 134.5 134.2 134.6 135.0 135.6
New_York 135.1 135.3 136.6 137.3 137.2 137.1 138.4
The following statements transpose the CITYCPI data set back to the original form of the CPICITY data set. The variable _NAME_ is added to the data set to tell PROC TRANSPOSE the name of the variable in which to store the observations in the transposed data set. (If the (DROP=_NAME_ _LABEL_) option were omitted from the first PROC TRANSPOSE step, this would not be necessary. PROC TRANSPOSE assumes ID _NAME_ by default.) The NAME=CITY option in the PROC TRANSPOSE statement causes PROC TRANSPOSE to store the names of the transposed variables in the variable CITY. Because PROC TRANSPOSE recodes the values of the CITY variable to create valid SAS variable names in the transposed data set, the values of the variable CITY in the retransposed data set are not the same as in the original.
Time Series Interpolation F 121
The retransposed data set is shown in Figure 3.23. data temp; set citycpi; _name_ = 'CPI'; run; proc transpose data=temp out=retrans name=city; by date; run; proc sort data=retrans; by city date; run; title "Retransposed Data Set"; proc print data=retrans; run;
Figure 3.23 Data Set Transposed Back to Original Form Retransposed Data Set Obs
date
city
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
JAN90 FEB90 MAR90 APR90 MAY90 JUN90 JUL90 JAN90 FEB90 MAR90 APR90 MAY90 JUN90 JUL90 JAN90 FEB90 MAR90 APR90 MAY90 JUN90 JUL90
Chicago Chicago Chicago Chicago Chicago Chicago Chicago Los_Angeles Los_Angeles Los_Angeles Los_Angeles Los_Angeles Los_Angeles Los_Angeles New_York New_York New_York New_York New_York New_York New_York
CPI 128.1 129.2 129.5 130.4 130.4 131.7 132.0 132.1 133.6 134.5 134.2 134.6 135.0 135.6 135.1 135.3 136.6 137.3 137.2 137.1 138.4
Time Series Interpolation The EXPAND procedure interpolates time series. This section provides a brief summary of the use of PROC EXPAND for different kinds of time series interpolation problems. Most of the issues discussed in this section are explained in greater detail in Chapter 14.
122 F Chapter 3: Working with Time Series Data
By default, the EXPAND procedure performs interpolation by first fitting cubic spline curves to the available data and then computing needed interpolating values from the fitted spline curves. Other interpolation methods can be requested. Note that interpolating values of a time series does not add any real information to the data because the interpolation process is not the same process that generated the other (nonmissing) values in the series. While time series interpolation can sometimes be useful, great care is needed in analyzing time series that contain interpolated values.
Interpolating Missing Values To use the EXPAND procedure to interpolate missing values in a time series, specify the input and output data sets in the PROC EXPAND statement, and specify the time ID variable in an ID statement. For example, the following statements cause PROC EXPAND to interpolate values for missing values of all numeric variables in the data set USPRICE: proc expand data=usprice out=interpl; id date; run;
Interpolated values are computed only for embedded missing values in the input time series. Missing values before or after the range of a series are ignored by the EXPAND procedure. In the preceding example, PROC EXPAND assumes that all series are measured at points in time given by the value of the ID variable. In fact, the series in the USPRICE data set are monthly averages. PROC EXPAND can produce a better interpolation if this is taken into account. The following example uses the FROM=MONTH option to tell PROC EXPAND that the series is monthly and uses the CONVERT statement with the OBSERVED=AVERAGE to specify that the series values are averages over each month: proc expand data=usprice out=interpl from=month; id date; convert cpi ppi / observed=average; run;
Interpolating to a Higher or Lower Frequency You can use PROC EXPAND to interpolate values of time series at a higher or lower sampling frequency than the input time series. To change the periodicity of time series, specify the time interval of the input data set with the FROM= option, and specify the time interval for the desired output frequency with the TO= option. For example, the following statements compute interpolated weekly values of the monthly CPI and PPI series: proc expand data=usprice out=interpl
Interpolating between Stocks and Flows, Levels and Rates F 123
from=month to=week; id date; convert cpi ppi / observed=average; run;
Interpolating between Stocks and Flows, Levels and Rates A distinction is made between variables that are measured at points in time and variables that represent totals or averages over an interval. Point-in-time values are often called stocks or levels. Variables that represent totals or averages over an interval are often called flows or rates. For example, the annual series Gross National Product represents the final goods production of over the year and also the yearly average rate of that production. However, the monthly variable Inventory represents the cost of a stock of goods at the end of the month. The EXPAND procedure can convert between point-in-time values and period average or total values. To convert observation characteristics, specify the input and output characteristics with the OBSERVED= option in the CONVERT statement. For example, the following statements use the monthly average price index values in USPRICE to compute interpolated estimates of the price index levels at the midpoint of each month. proc expand data=usprice out=midpoint from=month; id date; convert cpi ppi / observed=(average,middle); run;
Reading Time Series Data Time series data can be coded in many different ways. The SAS System can read time series data recorded in almost any form. Earlier sections of this chapter show how to read time series data coded in several commonly used ways. This section shows how to read time series data from data records coded in two other commonly used ways not previously introduced. Several time series databases distributed by major data vendors can be read into SAS data sets by the DATASOURCE procedure. See Chapter 11, “The DATASOURCE Procedure,” for more information. The SASECRSP, SASEFAME, and SASEHAVR interface engines enable SAS users to access and process time series data in CRSPAccess data files, FAME databases, and Haver Analytics Data Link Express (DLX) data bases, respectively. See Chapter 35, “The SASECRSP Interface Engine,” Chapter 36, “The SASEFAME Interface Engine,” and Chapter 37, “The SASEHAVR Interface Engine,” for more details.
124 F Chapter 3: Working with Time Series Data
Reading a Simple List of Values Time series data can be coded as a simple list of values without dating information and with an arbitrary number of observations on each data record. In this case, the INPUT statement must use the trailing “@@” option to retain the current data record after reading the values for each observation, and the time ID variable must be generated with programming statements. For example, the following statements read the USPRICE data set from data records that contain pairs of values for CPI and PPI. This example assumes you know that the first pair of values is for June 1990. data usprice; input cpi ppi @@; date = intnx( 'month', format date monyy7.; datalines; 129.9 114.3 130.4 114.5 132.7 118.4 133.5 120.8 134.6 119.0 134.8 117.2 135.6 116.5 136.0 116.3 ;
'1jun1990'd, _n_-1 );
131.6 133.8 135.0 136.2
116.5 120.1 133.8 118.7 116.2 135.2 116.0 116.0
Reading Fully Described Time Series in Transposed Form Data for several time series can be coded with separate groups of records for each time series. Data files coded this way are transposed from the form required by SAS procedures. Time series data can also be coded with descriptive information about the series included with the data records. The following example reads time series data for the USPRICE data set coded with separate groups of records for each series. The data records for each series consist of a series description record and one or more value records. The series description record gives the series name, starting month and year of the series, number of values in the series, and a series label. The value records contain the observations of the time series. The data are first read into a temporary data set that contains one observation for each value of each series. data temp; length _name_ $8 _label_ $40; keep _name_ _label_ date value; format date monyy.; input _name_ month year nval _label_ &; date = mdy( month, 1, year ); do i = 1 to nval; input value @; output; date = intnx( 'month', date, 1 ); end;
Reading Fully Described Time Series in Transposed Form F 125
datalines; cpi 8 90 12 131.6 132.7 133.5 135.2 135.6 136.0 ppi 6 90 13 114.3 114.5 116.5 117.2 116.2 116.0 ;
Consumer Price Index 133.8 133.8 134.6 134.8 135.0 136.2 Producer Price Index 118.4 120.8 120.1 118.7 119.0 116.5 116.3
The following statements sort the data set by date and series name, and the TRANSPOSE procedure is used to transpose the data into a standard form time series data set. proc sort data=temp; by date _name_; run; proc transpose data=temp out=usprice(drop=_name_); by date; var value; run; proc contents data=usprice; run; proc print data=usprice; run;
The final data set is shown in Figure 3.25. Figure 3.24 Contents of USPRICE Data Set Retransposed Data Set The CONTENTS Procedure Alphabetic List of Variables and Attributes #
Variable
Type
3 1 2
cpi date ppi
Num Num Num
Len
Format
8 8 8
MONYY.
Label Consumer Price Index Producer Price Index
126 F Chapter 3: Working with Time Series Data
Figure 3.25 Listing of USPRICE Data Set Retransposed Data Set Obs
date
ppi
cpi
1 2 3 4 5 6 7 8 9 10 11 12 13 14
JUN90 JUL90 AUG90 SEP90 OCT90 NOV90 DEC90 JAN91 FEB91 MAR91 APR91 MAY91 JUN91 JUL91
114.3 114.5 116.5 118.4 120.8 120.1 118.7 119.0 117.2 116.2 116.0 116.5 116.3 .
. . 131.6 132.7 133.5 133.8 133.8 134.6 134.8 135.0 135.2 135.6 136.0 136.2
Chapter 4
Date Intervals, Formats, and Functions Contents Overview . . . . . . . . . . . . . . . . . . . . . Time Intervals . . . . . . . . . . . . . . . . . . . Constructing Interval Names . . . . . . . . Shifted Intervals . . . . . . . . . . . . . . Beginning Dates and Datetimes of Intervals Summary of Interval Types . . . . . . . . Examples of Interval Specifications . . . . Custom Time Intervals . . . . . . . . . . . . . . Date and Datetime Informats . . . . . . . . . . . Date, Time, and Datetime Formats . . . . . . . . Date Formats . . . . . . . . . . . . . . . . Datetime and Time Formats . . . . . . . . Alignment of SAS Dates . . . . . . . . . . . . . SAS Date, Time, and Datetime Functions . . . . References . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
.
.
.
.
127 128 128 129 130 131 134 135 140 141 142 146 146 147 152
Overview This chapter summarizes the time intervals, date and datetime informats, date and datetime formats, and date, time, and datetime functions available in SAS software. The use of these features is explained in Chapter 3, “Working with Time Series Data.” The material in this chapter is also contained in SAS Language Reference: Concepts and SAS Language Reference: Dictionary. Because these features are useful for work with time series data, documentation of these features is consolidated and repeated here for easy reference.
128 F Chapter 4: Date Intervals, Formats, and Functions
Time Intervals This section provides a reference for the different kinds of time intervals supported by SAS software, but it does not cover how they are used. For an introduction to the use of time intervals, see Chapter 3, “Working with Time Series Data.” Some interval names are used with SAS date values, while other interval names are used with SAS datetime values. The interval names used with SAS date values are YEAR, SEMIYEAR, QTR, MONTH, SEMIMONTH, TENDAY, WEEK, WEEKDAY, DAY, YEARV, R445YR, R454YR, R544YR, R445QTR, R454QTR, R544QTR, R445MON, R454MON, R544MON, and WEEKV. The interval names used with SAS datetime or time values are HOUR, MINUTE, and SECOND. Various abbreviations of these names are also allowed, as described in the section “Summary of Interval Types” on page 131. Interval names for use with SAS date values can be prefixed with ‘DT’ to construct interval names for use with SAS datetime values. The interval names DTYEAR, DTSEMIYEAR, DTQTR, DTMONTH, DTSEMIMONTH, DTTENDAY, DTWEEK, DTWEEKDAY, DTDAY, DTYEARV, DTR445YR, DTR454YR, DTR544YR, DTR445QTR, DTR454QTR, DTR544QTR, DTR445MON, DTR454MON, DTR544MON, and DTWEEKV are used with SAS datetime values.
Constructing Interval Names Multipliers and shift indexes can be used with the basic interval names to construct more complex interval specifications. The general form of an interval name is as follows: NAMEn.s The three parts of the interval name are shown below: NAME
the name of the basic interval type. For example, YEAR specifies yearly intervals.
n
an optional multiplier that specifies that the interval is a multiple of the period of the basic interval type. For example, the interval YEAR2 consists of two-year (biennial) periods.
s
an optional starting subperiod index that specifies that the intervals are shifted to later starting points. For example, YEAR.3 specifies yearly periods shifted to start on the first of March of each calendar year and to end in February of the following year.
Both the multiplier n and the shift index s are optional and default to 1. For example, YEAR, YEAR1, YEAR.1, and YEAR1.1 are all equivalent ways of specifying ordinary calendar years.
Shifted Intervals F 129
To test for a valid interval specification, use the INTTEST function: interval = 'MONTH3.2'; valid = INTTEST( interval ); valid = INTTEST( 'YEAR4');
INTTEST returns a value of 0 if the argument is not a valid interval specification and 1 if the argument is a valid interval specification. The INTTEST function can also be used in a DATA step to test an interval before calling an interval function: valid = INTTEST( interval ); if ( valid = 1 ) then do; end_date = INTNX( interval, date, 0, 'E' ); Status = 'Success'; end; if ( valid = 0 ) then Status = 'Failure';
For more information about the INTTEST function, see the SAS Language Reference: Dictionary.
Shifted Intervals Different kinds of intervals are shifted by different subperiods: YEAR, SEMIYEAR, QTR, and MONTH intervals are shifted by calendar months. WEEK and DAY intervals are shifted by days. SEMIMONTH intervals are shifted by semimonthly periods. TENDAY intervals are shifted by 10-day periods. YEARV intervals are shifted by WEEKV intervals. R445YR, R445QTR, and R445MON intervals are shifted by R445MON intervals. R454YR, R454QTR, and R454MON intervals are shifted by R454MON intervals. R544YR, R544QTR, and R544MON intervals are shifted by R544MON intervals. WEEKV intervals are shifted by days. WEEKDAY intervals are shifted by weekdays. HOUR intervals are shifted by hours. MINUTE intervals are shifted by minutes. SECOND intervals are shifted by seconds.
130 F Chapter 4: Date Intervals, Formats, and Functions
The INTSHIFT function returns the shift interval: interval = 'MONTH3.2'; shift_interval = INTSHIFT( interval );
In this example, the value of shift_interval is ‘MONTH’. For more information about the INTSHIFT function, see the SAS Language Reference: Dictionary. If a subperiod is specified, the shift index cannot be greater than the number of subperiods in the whole interval. For example, you can use YEAR2.24, but YEAR2.25 is an error because there is no 25th month in a two-year interval. For interval types that shift by subperiods that are the same as the basic interval type, only multiperiod intervals can be shifted. For example, MONTH type intervals shift by MONTH subintervals; thus, monthly intervals cannot be shifted because there is only one month in MONTH. However, bimonthly intervals can be shifted because there are two MONTH intervals in each MONTH2 interval. The interval name MONTH2.2 specifies bimonthly periods that start on the first day of even-numbered months.
Beginning Dates and Datetimes of Intervals Intervals that represent divisions of a year begin with the start of the year (1 January). YEARV, R445YR, R454YR, and R544YR intervals begin with the first week of the International Organization for Standardization (ISO) year, the Monday on or immediately preceding January 4th. R445QTR, R454QTR, and R544QTR intervals begin with the 1st, 14th, 27th, and 40th weeks of the ISO year. MONTH2 periods begin with odd-numbered months (January, March, May, and so on). Likewise, intervals that represent divisions of a day begin with the start of the day (midnight). Thus, HOUR8.7 intervals divide the day into the periods 06:00 to 14:00, 14:00 to 22:00, and 22:00 to 06:00. Intervals that do not nest within years or days begin relative to the SAS date or datetime value 0. The arbitrary reference time of midnight on January 1, 1960, is used as the origin for nonshifted intervals, and shifted intervals are defined relative to that reference point. For example, MONTH13 defines the intervals January 1, 1960, February 1, 1961, March 1, 1962, and so forth, and the intervals December 1, 1959, November 1, 1958, and so on before the base date January 1, 1960. Similarly, the WEEK2 interval begins relative to the Sunday of the week of January 1, 1960. The interval specification WEEK6.13 defines six-week periods that start on second Fridays, and the convention of counting relative to the period that contains January 1, 1960, indicates the starting date or datetime of the interval closest to January 1, 1960, that corresponds to the second Fridays of six-week intervals. Intervals always begin on the date or datetime defined by the base interval name, the multiplier, and the shift value. The end of the interval immediately precedes the beginning of the next interval. However, an interval can be identified by any date or datetime value between its starting and ending values, inclusive. See the section “Alignment of SAS Dates” on page 146 for more information about generating identifying dates for intervals.
Summary of Interval Types F 131
Summary of Interval Types The interval types are summarized as follows: YEAR
specifies yearly intervals. Abbreviations are YEAR, YEARS, YEARLY, YR, ANNUAL, ANNUALLY, and ANNUALS. The starting subperiod s is in months (MONTH). YEARV
specifies ISO 8601 yearly intervals. The ISO 8601 year starts on the Monday on or immediately preceding January 4th. Note that it is possible for the ISO 8601 year to start in December of the preceding year. Also, some ISO 8601 years contain a leap week. For further discussion of ISO weeks, see Technical Committee ISO/TC 154, Documents in Commerce, and Administration (2004). The starting subperiod s is in ISO 8601 weeks (WEEKV). R445YR
is the same as YEARV except that the starting subperiod s is in retail 4-4-5 months (R445MON). R454YR
is the same as YEARV except that the starting subperiod s is in retail 4-5-4 months (R454MON). For a discussion of the retail 4-5-4 calendar, see National Retail Federation (2007). R544YR
is the same as YEARV except that the starting subperiod s is in retail 5-4-4 months (R544MON). SEMIYEAR
specifies semiannual intervals (every six months). Abbreviations are SEMIYEAR, SEMIYEARS, SEMIYEARLY, SEMIYR, SEMIANNUAL, and SEMIANN. The starting subperiod s is in months (MONTH). For example, SEMIYEAR.3 intervals are March–August and September–February. QTR
specifies quarterly intervals (every three months). Abbreviations are QTR, QUARTER, QUARTERS, QUARTERLY, QTRLY, and QTRS. The starting subperiod s is in months (MONTH). R445QTR
specifies retail 4-4-5 quarterly intervals (every 13 ISO 8601 weeks). Some fourth quarters contain a leap week. The starting subperiod s is in retail 4-4-5 months (R445MON). R454QTR
specifies retail 4-5-4 quarterly intervals (every 13 ISO 8601 weeks). Some fourth quarters contain a leap week. For a discussion of the retail 4-5-4 calendar, see National Retail Federation (2007). The starting subperiod s is in retail 4-5-4 months (R454MON). R544QTR
specifies retail 5-4-4 quarterly intervals (every 13 ISO 8601 weeks). Some fourth quarters contain a leap week. The starting subperiod s is in retail 5-4-4 months (R544MON).
132 F Chapter 4: Date Intervals, Formats, and Functions
MONTH
specifies monthly intervals. Abbreviations are MONTH, MONTHS, MONTHLY, and MON. The starting subperiod s is in months (MONTH). For example, MONTH2.2 intervals are February–March, April–May, June–July, August–September, October–November, and December–January of the following year. R445MON
specifies retail 4-4-5 monthly intervals. The 3rd, 6th, 9th, and 12th months are five ISO 8601 weeks long with the exception that some 12th months contain leap weeks. All other months are four ISO 8601 weeks long. R445MON intervals begin with the 1st, 5th, 9th, 14th, 18th, 22nd, 27th, 31st, 35th, 40th, 44th, and 48th weeks of the ISO year. The starting subperiod s is in retail 4-4-5 months (R445MON). R454MON
specifies retail 4-5-4 monthly intervals. The 2nd, 5th, 8th, and 11th months are five ISO 8601 weeks long. All other months are four ISO 8601 weeks long with the exception that some 12th months contain leap weeks. R454MON intervals begin with the 1st, 5th, 10th, 14th, 18th, 23rd, 27th, 31st, 36th, 40th, 44th, and 49th weeks of the ISO year. For a discussion of the retail 4-5-4 calendar, see National Retail Federation (2007). The starting subperiod s is in retail 4-5-4 months (R454MON). R544MON
specifies retail 5-4-4 monthly intervals. The 1st, 4th, 7th, and 10th months are five ISO 8601 weeks long. All other months are four ISO 8601 weeks long with the exception that some 12th months contain leap weeks. R544MON intervals begin with the 1st, 6th, 10th, 14th, 19th, 23rd, 27th, 32nd, 36th, 40th, 45th, and 49th weeks of the ISO year. The starting subperiod s is in retail 5-4-4 months (R544MON). SEMIMONTH
specifies semimonthly intervals. SEMIMONTH breaks each month into two periods, starting on the 1st and 16th days. Abbreviations are SEMIMONTH, SEMIMONTHS, SEMIMONTHLY, and SEMIMON. The starting subperiod s is in SEMIMONTH periods. For example, SEMIMONTH2.2 specifies intervals from the 16th of one month through the 15th of the next month. TENDAY
specifies 10-day intervals. TENDAY breaks the month into three periods, the 1st through the 10th day of the month, the 11th through the 20th day of the month, and the remainder of the month. (TENDAY is a special interval typically used for reporting automobile sales data.) The starting subperiod s is in TENDAY periods. For example, TENDAY4.2 defines 40-day periods that start at the second TENDAY period. WEEK
specifies weekly intervals of seven days. Abbreviations are WEEK, WEEKS, and WEEKLY. The starting subperiod s is in days (DAY), with the days of the week numbered as 1=Sunday, 2=Monday, 3=Tuesday, 4=Wednesday, 5=Thursday, 6=Friday, and 7=Saturday. For example, WEEK.7 means weekly with Saturday as the first day of the week.
Summary of Interval Types F 133
WEEKV
specifies ISO 8601 weekly intervals of seven days. Each week starts on Monday. The starting subperiod s is in days (DAY). Note that WEEKV differs from WEEK in that WEEKV.1 starts on Monday, WEEKV.2 starts on Tuesday, and so forth. WEEKDAY WEEKDAYdW WEEKDAYddW WEEKDAYdddW
specifies daily intervals with weekend days included in the preceding weekday. Note that for a five-day work week that starts on Monday, the appropriate interval is WEEKDAY5.2. Abbreviations are WEEKDAY and WEEKDAYS. The starting subperiod s is in weekdays (WEEKDAY). The WEEKDAY interval is the same as DAY except that weekend days are absorbed into the preceding weekday. Thus, there are five WEEKDAY intervals in a calendar week: Monday, Tuesday, Wednesday, Thursday, and the three-day period Friday-Saturday-Sunday. The default weekend days are Saturday and Sunday, but any one to six weekend days can be listed after the WEEKDAY string and followed by a W. Weekend days are specified as ‘1’ for Sunday, ‘2’ for Monday, and so forth. For example, WEEKDAY67W specifies a FridaySaturday weekend. WEEKDAY1W specifies a six-day work week with a Sunday weekend. WEEKDAY17W is the same as WEEKDAY. DAY
specifies daily intervals. Abbreviations are DAY, DAYS, and DAILY. The starting subperiod s is in days (DAY). HOUR
specifies hourly intervals. Aliases are HOUR, DTHOUR, HOURS, DTHOURS, HOURLY, DTHOURLY, HR, and DTHR. The starting subperiod s is in hours (HOUR). MINUTE
specifies minute intervals. Aliases are MINUTE, DTMINUTE, MINUTES, DTMINUTES, MIN, and DTMIN. The starting subperiod s is in minutes (MINUTE). SECOND
specifies second intervals. Aliases are SECOND, DTSECOND, SECONDS, DTSECONDS, SEC and DTSEC. The starting subperiod s is in seconds (SECOND).
134 F Chapter 4: Date Intervals, Formats, and Functions
Examples of Interval Specifications Table 4.1 shows examples of different kinds of interval specifications. Table 4.1
Examples of Intervals
Name YEAR YEAR.10 YEAR2.7 YEAR2.19 YEAR4.11
Description of Interval Years that start in January Years that start in October Biennial intervals that start in July of even years Biennial intervals that start in July of odd years Four-year intervals that start in November of leap years (frequency of U.S. presidential elections) YEAR4.35 Four-year intervals that start in November of even years between leap years (frequency of U.S. midterm elections) Years that start on the Monday on or immediately preceding YEARV January 4th YEARV.2 Years that start on the Monday immediately following January 4th R445MON Months that start on the 1st, 5th, 9th, 14th, 18th, 22nd, 27th, 31st, 35th, 40th, 44th, and 48th Monday of the year. The 1st Monday is the Monday on or immediately preceding January 4th Three-month intervals that start on the 1st, 14th, 27th, and 40th R445MON3 Monday of the year. This is equivalent to R445QTR R445MON3.2 Three-month intervals that start on the 5th, 18th, 31th, and 44th Monday of the year. This is equivalent to R445QTR.2 WEEK Weekly intervals that start on Sundays WEEK2 Biweekly intervals that start on first Sundays WEEK1.1 Same as WEEK WEEK.2 Weekly intervals that start on Mondays WEEK6.3 Six-week intervals that start on first Tuesdays WEEK6.11 Six-week intervals that start on second Wednesdays Daily with Friday-Saturday-Sunday counted as the same day (fiveWEEKDAY day work week with a Saturday-Sunday weekend) WEEKDAY17W Same as WEEKDAY WEEKDAY5.2 Five weekdays that start on Monday. If WEEKDAY data are accumulated into weekly data, the interval of the accumulated data is WEEKDAY5.2 WEEKDAY67W Daily with Thursday-Friday-Saturday counted as the same day (five-day work week with a Friday-Saturday weekend) WEEKDAY1W Daily with Saturday-Sunday counted as the same day (six-day work week with a Sunday weekend) WEEKDAY3.2 Three-weekday intervals (with Friday-Saturday-Sunday counted as one weekday) with the cycle three-weekday periods aligned to Monday, January 4, 1960 HOUR8.7 Eight-hour intervals that start at 6 a.m., 2 p.m., and 10 p.m. (might be used for work shifts)
Custom Time Intervals F 135
Custom Time Intervals The standard time intervals described in the previous sections do not always fit the data. For example, you might want to use fiscal months that begin on the 10th of each month, but the MONTH interval begins on the 1st of each month. Or you might collect data hourly for a business that is closed at night, but using the DTHOUR interval results in gaps in the data that can cause problems in standard time series analysis. In another case, you might wish to calculate the number of business days between dates, excluding holidays and weekends, but holidays are counted when you use the INTCK function with the WEEKDAY interval. For more information about the INTCK function, see “Interval Functions INTNX and INTCK” on page 97. Time series can be analyzed using observation numbers as the identifying reference. However, it is often desirable to maintain the time stamp for other types of modeling such as regression variables based on time or reconciliation. To address these issues, you can define custom intervals within a given SAS program. The use of custom intervals requires the following two steps for each interval: 1 Associate a data set name with a custom interval name by using the INTERVALDS= system
option. For more information about the INTERVALDS= option, see the SAS Language Reference: Dictionary. The following example associates the data set StoreHoursDS with the custom interval StoreHours. options intervalds=(StoreHours=StoreHoursDS);
2 Create a data set that describes the custom interval. The data set must contain a BEGIN variable.
It can also contain an END and a SEASON variable. It should contain a FORMAT statement for the BEGIN variable that specifies a SAS date, SAS datetime, or numeric format that matches the BEGIN variable data. If the END variable is present, it should also be included in the FORMAT statement. A numeric format that is not a SAS date or SAS datetime format indicates that the values are observation numbers. If the END variable is not present, then the implied value of END at each observation is one less than the value of BEGIN at the next observation. The span of the custom interval data set should include any dates or times that are necessary for performing calculations on the time series, including backcasting, forecasting, and other operations that might extend beyond the series (such as filters). After the two preceding steps have been completed, the custom interval can be specified in SAS procedures and functions where a standard time interval can be specified.
136 F Chapter 4: Date Intervals, Formats, and Functions
The following DATA step creates the StoreHoursDS data set, which is appropriate for a business that is open 9AM to 6PM Monday through Friday and Saturday 9AM to 1PM: options intervalds=(StoreHours=StoreHoursDS); data StoreHoursDS(keep=BEGIN END); start = '01JAN2009'D; stop = '31DEC2009'D; do date = start to stop; dow = WEEKDAY(date); datetime=dhms(date,0,0,0); if dow not in (1,7) then do hour = 9 to 17; begin=intnx('hour',datetime,hour,'b'); end=intnx('hour',datetime,hour,'e'); output; end; else if dow = 7 then do hour = 9 to 12; begin=intnx('hour',datetime,hour,'b'); end=intnx('hour',datetime,hour,'e'); output; end; end; format BEGIN END DATETIME.; run; title 'Store Hours Custom Interval'; proc print data=StoreHoursDS(obs=18); run;
The first 18 observations of the custom interval data set are shown in Figure 4.1. Figure 4.1 Store Hours Custom Interval Store Hours Custom Interval Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
begin
end
01JAN09:09:00:00 01JAN09:10:00:00 01JAN09:11:00:00 01JAN09:12:00:00 01JAN09:13:00:00 01JAN09:14:00:00 01JAN09:15:00:00 01JAN09:16:00:00 01JAN09:17:00:00 02JAN09:09:00:00 02JAN09:10:00:00 02JAN09:11:00:00 02JAN09:12:00:00 02JAN09:13:00:00 02JAN09:14:00:00 02JAN09:15:00:00 02JAN09:16:00:00 02JAN09:17:00:00
01JAN09:09:59:59 01JAN09:10:59:59 01JAN09:11:59:59 01JAN09:12:59:59 01JAN09:13:59:59 01JAN09:14:59:59 01JAN09:15:59:59 01JAN09:16:59:59 01JAN09:17:59:59 02JAN09:09:59:59 02JAN09:10:59:59 02JAN09:11:59:59 02JAN09:12:59:59 02JAN09:13:59:59 02JAN09:14:59:59 02JAN09:15:59:59 02JAN09:16:59:59 02JAN09:17:59:59
Custom Time Intervals F 137
The following DATA step creates the FMDS data set to define a custom interval FiscalMonth, which is appropriate for a business that uses fiscal months that start on the 10th of each month. The SAME alignment option of the INTNX function specifies that the dates generated by the INTNX function are the same day of the month as the date in the start variable. For more information about the INTNX function, see “SAS Date, Time, and Datetime Functions” on page 147. The MONTH function assigns the month of the BEGIN variable to the SEASON variable. This specifies monthly seasonality. options intervalds=(FiscalMonth=FMDS); data FMDS(keep=BEGIN SEASON); start = '10JAN1999'D; stop = '10JAN2001'D; nmonths = INTCK('MONTH',start,stop); do i=0 to nmonths; BEGIN = INTNX('MONTH',start,i,'S'); SEASON = MONTH(BEGIN); output; end; format BEGIN DATE.; run;
The difference between the custom FiscalMonth interval and a standard interval can be seen in the following example. The output shown in Figure 4.2 compares how the data are accumulated. For the FiscalMonth interval, values in the first nine days of the month are accumulated with the interval that begins in the previous month. For the standard MONTH interval, values in the first nine days of the month are accumulated with the calendar month. data sales(keep=DATE sales); do date = '01JAN2000'D to '31DEC2000'D; month = MONTH(date); dayofmonth = DAY(date); sales = 0; if ( dayofmonth lt 10 ) then sales = month/9; output; end; format date monyy.; run; proc timeseries data=sales out=dataInFiscalMonths; id DATE interval=FiscalMonth accumulate=total; var sales; run; proc timeseries data=sales out=dataInStdMonths; id DATE interval=Month accumulate=total; var sales; run; data compare; merge dataInFiscalMonths(rename=(sales=FM_sales)) dataInStdMonths(rename=(sales=SM_sales)); by DATE; run;
138 F Chapter 4: Date Intervals, Formats, and Functions
title 'Standard Monthly Data vs. Fiscal Month Data'; proc print data=compare; run;
Figure 4.2 Fiscal Months Custom Interval Standard Monthly Data vs. Fiscal Month Data Obs
date
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
10-DEC-1999 01-JAN-2000 10-JAN-2000 01-FEB-2000 10-FEB-2000 01-MAR-2000 10-MAR-2000 01-APR-2000 10-APR-2000 01-MAY-2000 10-MAY-2000 01-JUN-2000 10-JUN-2000 01-JUL-2000 10-JUL-2000 01-AUG-2000 10-AUG-2000 01-SEP-2000 10-SEP-2000 01-OCT-2000 10-OCT-2000 01-NOV-2000 10-NOV-2000 01-DEC-2000 10-DEC-2000
FM_sales
SM_sales
1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10 . 11 . 12 . 0
. 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10 . 11 . 12 .
Custom Time Intervals F 139
The next example uses custom intervals in the time function INTCK to omit holidays when counting business days. The result is shown in Figure 4.3. options intervalds=(BankingDays=BankDayDS); data BankDayDS(keep=BEGIN); start = '15DEC1998'D; stop = '15JAN2002'D; nwkdays = INTCK('WEEKDAY',start,stop); do i = 0 to nwkdays; BEGIN = INTNX('WEEKDAY',start,i); year = YEAR(BEGIN); if BEGIN ne HOLIDAY("NEWYEAR",year) and BEGIN ne HOLIDAY("MLK",year) and BEGIN ne HOLIDAY("USPRESIDENTS",year) and BEGIN ne HOLIDAY("MEMORIAL",year) and BEGIN ne HOLIDAY("USINDEPENDENCE",year) and BEGIN ne HOLIDAY("LABOR",year) and BEGIN ne HOLIDAY("COLUMBUS",year) and BEGIN ne HOLIDAY("VETERANS",year) and BEGIN ne HOLIDAY("THANKSGIVING",year) and BEGIN ne HOLIDAY("CHRISTMAS",year) then output; end; format BEGIN DATE.; run; data CountDays; start = '01JAN1999'D; stop = '31DEC2001'D; ActualDays = INTCK('DAYS',start,stop); Weekdays = INTCK('WEEKDAYS',start,stop); BankDays = INTCK('BankingDays',start,stop); format start stop DATE.; run; title 'Methods of Counting Days'; proc print data=CountDays; run;
Figure 4.3 Bank Days Custom Interval Methods of Counting Days
Obs
start
stop
Actual Days
Weekdays
Bank Days
1
01JAN99
31DEC01
1095
781
757
140 F Chapter 4: Date Intervals, Formats, and Functions
Date and Datetime Informats Table 4.2 lists some of the SAS date and datetime informats available to read date, time, and datetime values. See Chapter 3, “Working with Time Series Data,” for a discussion of the use of date and datetime informats. See SAS Language Reference: Concepts for a complete description of these informats. For each informat, Table 4.2 shows an example of a date or datetime value written in the style that the informat is designed to read. You can specify the width of each informat by adding w. For informats that include second values, you can specify the number of decimal digits for seconds by adding d. Table 4.2 shows the width range allowed by the informat and the default width. The date 17 October 1991 and the time 2:25:32 p.m. are used for the example in all cases. Table 4.2
Frequently Used SAS Date and Datetime Informats
Informat and Example
Description
Width Range
Default Width
ANYDTDTEw.
Reads and extracts the date value from any of the following: DATE, DATETIME, DDMMYY, JULIAN, MDYAMPM, MMDDYY, MMxYY*, MONYY, TIME, YMDDTTM, YYMMDD, YYQ, YYxMM*, month-day-year
5–32
9
ANYDTDTMw.
Reads and extracts the datetime value from any of the following: DATE, DATETIME, DDMMYY, JULIAN, MMDDYY, MMxYY*, MONYY, TIME, YYMMDD, YYQ, YYxMM*, month-day-year
1–32
19
ANYDTTMEw.
Reads and extracts the time value from any of the following: DATE, DATETIME, DDMMYY, JULIAN, MMDDYY, MONYY, TIME, YYMMDD, YYQ, month-day-year
1–32
8
DATEw. 17oct91
Day, month abbreviation, and year: ddmonyy
7–32
7
DATETIMEw.d 17oct91:14:45:32
Date and time: ddmonyy:hh:mm:ss
13–40
18
DDMMYYw. 17/10/91
Day, month, year: ddmmyy, dd/mm/yy, dd-mm-yy, or dd mm yy
6–32
6
JULIANw. 91290
Year and day of year (Julian dates): yyddd
5–32
5
Date, Time, and Datetime Formats F 141
Table 4.2
continued
Informat and Example
Description
Width Range
Default Width
MMDDYYw. 10/17/91
Month, day, year: mmddyy, mm/dd/yy, mm-dd-yy, or mm dd yy
6–32
6
MONYYw. Oct91
Month abbreviation and year: monyy
5–32
5
NENGOw. H.03/10/17
Japanese Nengo notation
7–32
10
TIMEw.d 14:45:32
Hours, minutes, seconds: hh:mm:ss or hours, minutes: hh:mm
5–32
8
WEEKVw. 1991-W42-04
ISO 8601 year, week, day of week: yyyy-Www-dd
3–200
11
YYMMDDw. 91/10/17
Year, month, day: yymmdd, yy/mm/dd, yy-mm-dd, or yy mm dd
6–32
6
YYQw. 91Q4
Year and quarter of year: yyQq
4–32
4
Date, Time, and Datetime Formats Some of the commonly used SAS date and datetime formats are listed in Table 4.3 and Table 4.4. You can specify the width value for each format by adding w. The tables list the range of width values allowed and the default width value for each format. The notation used by a format is abbreviated in different ways depending on the width option used. For example, the format MMDDYY8. writes the date 17 October 1991 as 10/17/91, while the format MMDDYY6. writes this date as 101791. In particular, formats that display the year show two-digit or four-digit year values depending on the width option. The examples shown in the tables use the default width. The interval function INTFMT returns a recommended format for time ID values based on the interval that describes the frequency of the values. The following example uses INTFMT to select a format to display the quarterly time ID variable qtrDate. In this example, INTFMT returns the format YYQC6., which displays the year in four digits and the quarter in a single digit. This selected format is stored in a macro variable that is created by the CALL SYMPUT statement. The second argument to INTFMT controls the width of the year for date formats; it can take the value ‘long’ or ‘l’ to indicate 4 for the year width or the value ‘short’ or ‘s’ to indicate 2 for the year width. For more
142 F Chapter 4: Date Intervals, Formats, and Functions
information about the INTFMT function, see the SAS Language Reference: Dictionary. For more information about the CALL SYMPUT statement, see the SAS Language Reference: Dictionary. The macro variable &FMT is then used in the FORMAT statement in the PROC PRINT step as follows: data b(keep=qtrDate); interval = 'QTR'; form = INTFMT( interval, 'long' ); call symput('fmt',form); do i=1 to 4; qtrDate = INTNX( interval, '01jan00'd, i-1 ); output; end; run; proc print; format qtrDate &fmt; run;
See SAS Language Reference: Concepts for a complete description of these formats, including the variations of the formats produced by different width options. See Chapter 3, “Working with Time Series Data,” for a discussion of the use of date and datetime formats.
Date Formats Table 4.3 lists some of the available SAS date formats. For each format, an example is shown of a date value in the notation produced by the format. The date ‘17OCT91’D is used as the example. Table 4.3
Frequently Used SAS Date Formats
Format and Example
Description
Width Range
Default Width
DATEw. 17OCT91
Day, month abbreviation, year: ddmonyy
5–9
7
DAYw. 17
Day of month
2–32
2
DDMMYYw. 17/10/91
Day, month, year: dd/mm/yy
2–8
8
DOWNAMEw. Thursday
Name of day of the week
1–32
9
JULDAYw. 290
Day of year
3–32
3
Date Formats F 143
Table 4.3
continued
Format and Example
Description
Width Range
Default Width
JULIANw. 91290
Year and day of year: yyddd
5–7
5
MMDDYYw. 10/17/91
Month, day, year: mm/dd/yy
2–8
8
MMYYw. 10M1991
Month and year: mmMyyyy
5–32
7
MMYYCw. 10:1991
Month and year: mm:yyyy
5–32
7
MMYYDw. 10-1991
Month and year: mm-yyyy
5–32
7
MMYYPw. 10.1991
Month and year: mm.yyyy
5–32
7
MMYYSw. 10/1991
Month and year: mm/yyyy
5–32
7
MMYYNw. 101991
Month and year: mmyyyy
5–32
6
MONNAMEw. October
Name of month
1–32
9
MONTHw. 10
Month of year
1–32
2
MONYYw. OCT91
Month abbreviation and year: monyy
5–7
5
QTRw. 4
Quarter of year
1–32
1
QTRRw. IV
Quarter in roman numerals
3–32
3
NENGOw. H.03/10/17
Japanese Nengo notation
2–10
10
144 F Chapter 4: Date Intervals, Formats, and Functions
Table 4.3
continued
Format and Example
Description
Width Range
Default Width
WEEKDATEw. Thursday, October 17, 1991
day-of-week, month-name dd, yyyy
3–37
29
WEEKDATXw. Thursday, 17 October 1991
day-of-week, dd month-name yyyy
3–37
29
WEEKDAYw. 5
Day of week
1–32
1
WEEKVw. 1991-W42-04
ISO 8601 year, week, day of week: yyyy-Www-dd
3–200
11
WORDDATEw. October 17, 1991
month-name dd, yyyy
3–32
18
WORDDATXw. 17 October 1991
dd month-name yyyy
3–32
18
YEARw. 1991
Year: yyyy
2–32
4
YYMMw. 1991M10
Year and month: yyyyMmm
5–32
7
YYMMCw. 1991:10
Year and month: yyyy:mm
5–32
7
YYMMDw. 1991-10
Year and month: yyyy-mm
5–32
7
YYMMPw. 1991.10
Year and month: yyyy.mm
5–32
7
YYMMSw. 1991/10
Year and month: yyyy/mm
5–32
7
YYMMNw. 199110
Year and month: yyyymm
5–32
7
YYMONw. 1991OCT
Year and month abbreviation: yyyymon
5–32
7
Date Formats F 145
Table 4.3
continued
Format and Example
Description
Width Range
Default Width
YYMMDDw. 91/10/17
Year, month, day: yy/mm/dd
2–8
8
YYQw. 1991Q4
Year and quarter: yyyyQq
4–6
6
YYQCw. 1991:4
Year and quarter: yyyy:q
4–32
6
YYQDw. 1991-4
Year and quarter: yyyy-q
4–32
6
YYQPw. 1991.4
Year and quarter: yyyy.q
4–32
6
YYQSw. 1991/4
Year and quarter: yyyy/q
4–32
6
YYQNw. 19914
Year and quarter: yyyyq
3–32
5
YYQRw. 1991QIV
Year and quarter in roman numerals: yyyyQrr
6–32
8
YYQRCw. 1991:IV
Year and quarter in roman numerals: yyyy:rr
6–32
8
YYQRDw. 1991-IV
Year and quarter in roman numerals: yyyy-rr
6–32
8
YYQRPw. 1991.IV
Year and quarter in roman numerals: yyyy.rr
6–32
8
YYQRSw. 1991/IV
Year and quarter in roman numerals: yyyy/rr
6–32
8
YYQRNw. 1991IV
Year and quarter in roman numerals: yyyyrr
6–32
8
146 F Chapter 4: Date Intervals, Formats, and Functions
Datetime and Time Formats Table 4.4 lists some of the available SAS datetime and time formats. For each format, the example shows the formatted value. The value of the variable dt is ‘17OCT91:14:25:32’DT. You can specify the width of each format by adding w. For formats that allow a decimal value, you can specify the number of decimal digits by adding d. Table 4.4
Frequently Used SAS Datetime and Time Formats
Format
Value and Example
Description
Width Range
Default Width
DATETIMEw.d dt 17OCT91:14:25:32
ddmonyy:hh:mm:ss.ss
7–40
16
DTWKDATXw. dt Thursday, 17 October 1991
day-of-week, dd month yyyy
3–37
29
HHMMw.d
TIMEPART(dt) 14:26
Hour and minute: hh:mm.mm
2–20
5
HOURw.d
TIMEPART(dt) 14
Hour: hh.hh
2–20
2
MMSSw.d
HMS(0,MINUTE(dt),SECOND(dt)) Minutes and seconds: 25:32 mm:ss.ss
2–20
5
TIMEw.d
TIMEPART(dt) 14:25:32
Time of day: hh:mm:ss.ss
2–20
8
TODw.d
dt
Time of day: hh:mm:ss.ss
2–20
8
14:25:32
Alignment of SAS Dates SAS date values that are used to identify time series observations produced by SAS/ETS and SAS High-Performance Forecasting procedures are normally aligned with the beginning of the time intervals that correspond to the observations. For example, for monthly data for 1994, the date values that identify the observations are 1Jan94, 1Feb94, 1Mar94, . . . , 1Dec94. However, for some applications it might be preferable to use end-of-period dates, such as 31Jan94, 28Feb94, 31Mar94, . . . , 31Dec94. For other applications, such as plotting time series, it might be more convenient to use interval midpoint dates to identify the observations.
SAS Date, Time, and Datetime Functions F 147
Many SAS/ETS and SAS High-Performance Forecasting procedures provide an ALIGN= option to control the alignment of dates for outputting time series observations. SAS/ETS procedures that support the ALIGN= option are ARIMA, DATASOURCE, ESM, EXPAND, FORECAST, SIMILARITY, TIMESERIES, UCM, and VARMAX. SAS High-Performance Forecasting procedures that support the ALIGN= option are HPFRECONCILE, HPF, HPFDIAGNOSE, HPFENGINE, and HPFEVENTS. ALIGN=
The ALIGN= option can have the following values: BEGINNING
specifies that dates be aligned to the start of the interval. This is the default. BEGINNING can be abbreviated as BEGIN, BEG, or B.
MIDDLE
specifies that dates be aligned to the interval midpoint, the average of the beginning and ending values. MIDDLE can be abbreviated as MID or M.
ENDING
specifies that dates be aligned to the end of the interval. ENDING can be abbreviated as END or E.
For information about the calculation of the beginning and ending values of intervals, see the section “Beginning Dates and Datetimes of Intervals” on page 130.
SAS Date, Time, and Datetime Functions SAS date, time, and datetime functions are used to perform the following tasks: compute date, time, and datetime values from calendar and time-of-day values compute calendar and time-of-day values from date and datetime values convert between date, time, and datetime values perform calculations that involve time intervals provide information about time intervals provide information about seasonality For all interval functions, you can supply the intervals and other character arguments either directly as a quoted string or as a SAS character variable. When you use a character variable, you should set the length of the character variable to at least the length of the longest string for that variable that is used in the DATA step. Also, to ensure correct results when using interval functions, use date intervals with date values and datetime intervals with datetime values. See SAS Language Reference: Dictionary for a complete description of these functions.
148 F Chapter 4: Date Intervals, Formats, and Functions
The following list shows SAS date, time, and datetime functions in alphabetical order. DATE()
returns today’s date as a SAS date value. DATEJUL( yyddd )
returns the SAS date value when given the Julian date in yyddd or yyyyddd format. For example, DATE = DATEJUL(99001); assigns the SAS date value ‘01JAN99’D to DATE, and DATE = DATEJUL(1999365); assigns the SAS date value ‘31DEC1999’D to DATE. DATEPART( datetime )
returns the date part of a SAS datetime value as a date value. DATETIME()
returns the current date and time of day as a SAS datetime value. DAY( date )
returns the day of the month from a SAS date value. DHMS( date, hour, minute, second )
returns a SAS datetime value for date, hour, minute, and second values. HMS( hour, minute, second )
returns a SAS time value for hour, minute, and second values. HOLIDAY( ‘holiday ’, year )
returns a SAS date value for the holiday and year specified. Valid values for holiday are ‘BOXING’, ‘CANADA’, ‘CANADAOBSERVED’, ‘CHRISTMAS’, ‘COLUMBUS’, ‘EASTER’, ‘FATHERS’, ‘HALLOWEEN’, ‘LABOR’, ‘MLK’, ‘MEMORIAL’, ‘MOTHERS’, ‘NEWYEAR’,‘THANKSGIVING’, ‘THANKSGIVINGCANADA’, ‘USINDEPENDENCE’, ‘USPRESIDENTS’, ‘VALENTINES’, ‘VETERANS’, ‘VETERANSUSG’, ‘VETERANSUSPS’, and ‘VICTORIA’. For example: EASTER2000 = HOLIDAY(’EASTER’, 2000); HOUR( datetime )
returns the hour from a SAS datetime or time value. INTCINDEX( ‘date-interval’, date ) INTCINDEX( ‘datetime-interval’, datetime )
returns the index of the seasonal cycle when given an interval and an appropriate SAS date, datetime, or time value. For example, the seasonal cycle for INTERVAL=‘DAY’ is ‘WEEK’, so INTCINDEX(’DAY’,’01SEP78’D); returns 35 because September 1, 1978, is the sixth day of the 35th week of the year. For correct results, date intervals should be used with date values, and datetime intervals should be used with datetime values. INTCK( ‘date-interval’, date1, date2 < , ‘method’ > ) INTCK( ‘datetime-interval’, datetime1, datetime2 < , ‘method’ > )
returns the number of boundaries of intervals of the given kind that lie between the two date or datetime values. The optional method argument specifies that the intervals are counted using either a discrete or a continuous method. The default DISCRETE (or DISC or D) method uses discrete time intervals. For the DISCRETE method, the distance in MONTHS between
SAS Date, Time, and Datetime Functions F 149
January 31, 2000, and February 1, 2000, is one month. The CONTINUOUS (or CONT or C) method uses continuous time intervals. For the CONTINUOUS method, the distance in MONTHS between January 15, 2000, and February 14, 2000, is zero, but the distance in MONTHS between January 15, 2000, and February 15, 2000, is one month. INTCYCLE( ‘interval’ )
returns the interval of the seasonal cycle, given a date, time, or datetime interval. For example, INTCYCLE(‘MONTH’) returns ‘YEAR’ because the months January, February, . . . , December constitute a yearly cycle. INTCYCLE(‘DAY’) returns ‘WEEK’ because Sunday, Monday, . . . , Saturday is a weekly cycle. INTFIT( date1, date2, ‘D’ ) INTFIT( datetime1, datetime2, ‘DT ’ ) INTFIT( obs1, obs2, ‘OBS’ )
returns an interval that fits exactly between two SAS date, datetime, or observation values. That is, if the interval result of the INTFIT function is used with date1, 1, and SAMEDAY alignment in the INTNX function, then the result is date2. This concept is illustrated in the following example, where result1 is the same as date1 and result2 is the same as date2. FitInterval = INTFIT( date1, date2, 'D' ); result1 = INTNX( FitInterval, date1, 0, 'SAMEDAY'); result2 = INTNX( FitInterval, date1, 1, 'SAMEDAY');
More than one interval can fit the preceding definition. For example, two SAS date values that are seven days apart could be fit with either ‘DAY7’ or ‘WEEK’. The INTFIT function chooses the more common interval, so ‘WEEK’ is the result when the dates are seven days apart. The INTFIT function can be used to detect the possible frequency of the time series or to analyze frequencies of other events in a time series, such as outliers or missing values. INTFMT(‘interval’ ,‘size’)
returns a recommended format when given a date, time, or datetime interval for displaying the time ID values associated with a time series of the given interval. The second argument to INTFMT controls the width of the year for date formats; it can take the value ‘long’ or ‘l’ to specify that the returned format display a four-digit year or the value ‘short’ or ‘s’ to specify that the returned format display a two-digit year. INTGET( date1, date2, date3 ) INTGET( datetime1, datetime2, datetime3 )
returns an interval that fits three consecutive SAS date or datetime values. The INTGET function examines two intervals: the first interval between date1 and date2, and the second interval between date2 and date3. In order for an interval to be detected, either the two intervals must be the same or one interval must be an integer multiple of the other interval. That is, INTGET assumes that at least two of the dates are consecutive points in the time series, and that the other two dates are also consecutive or represent the points before and after missing observations. The INTGET function assumes that large values are SAS datetime values, which are measured in seconds, and that smaller values are SAS date values, which are measured in days. The INTGET function can be used to detect the possible frequency of the time series or to analyze frequencies of other events in a time series, such as outliers or missing values.
150 F Chapter 4: Date Intervals, Formats, and Functions
INTINDEX( ‘date-interval’, date ) INTINDEX( ‘datetime-interval’, datetime )
returns the seasonal index when given a date or datetime interval and an appropriate date or datetime value. The seasonal index is a number that represents the position of the date or datetime value in the seasonal cycle of the specified interval. For example, INTINDEX(’MONTH’,’01DEC2000’D); returns 12 because monthly data is yearly periodic and DECEMBER is the 12th month of the year. However, INTINDEX(’DAY’,’01DEC2000’D); returns 6 because daily data is weekly periodic and December 01, 2000, is a Friday, the sixth day of the week. To correctly identify the seasonal index, the interval specification should agree with the date or datetime value. For example, INTINDEX(’DTMONTH’,’01DEC2000’D); and INTINDEX(’MONTH’,’01DEC2000:00:00:00’DT); do not return the expected value of 12. However, both INTINDEX(’MONTH’,’01DEC2000’D); and INTINDEX(’DTMONTH’,’01DEC2000:00:00:00’DT); return the expected value of 12. INTNX( ‘date-interval’, date, n < , ‘alignment’ > ) INTNX( ‘datetime-interval’, datetime, n < , ‘alignment’ > )
returns the date or datetime value of the beginning of the interval that is n intervals from the interval that contains the given date or datetime value. The optional alignment argument specifies that the returned date is aligned to the beginning, middle, or end of the interval. Beginning is the default. In addition, you can specify SAME (S) alignment. The SAME alignment bases the alignment of the calculated date or datetime value on the alignment of the input date or datetime value. As illustrated in the following example, the SAME alignment can be used to calculate the meaning of “same day next year” or “same day two weeks from now.” nextYear = INTNX( 'YEAR', '15Apr2007'D, 1, 'S' ); TwoWeeks = INTNX( 'WEEK', '15Apr2007'D, 2, 'S' );
The preceding example returns ‘15Apr2008’D for nextYear and ‘29Apr2007’D for TwoWeeks. For all values of alignment, the number of discrete intervals n between the input date and the resulting date agrees with the input value. In the following example, the result is always that n2 = n1: date2 = INTNX( interval, date1, n1, align ); n2 = INTCK( interval, date1, date2 );
The preceding example uses the DISCRETE method of the INTCK function by default. The result n2 = n1 does not always apply when the CONTINUOUS method of the INTCK function is specified. INTSEAS( ‘interval’ )
returns the length of the seasonal cycle when given a date or datetime interval. The length of a seasonal cycle is the number of intervals in a seasonal cycle. For example, when the interval for a time series is described as monthly, many procedures use the option INTERVAL=MONTH to indicate that each observation in the data corresponds to a particular month. Monthly data are considered to be periodic for a one-year seasonal cycle. There are 12 months in one year, so the number of intervals (months) in a seasonal cycle (year) is 12. For quarterly data, there
SAS Date, Time, and Datetime Functions F 151
are 4 quarters in one year, so the number of intervals in a seasonal cycle is 4. The periodicity is not always one year. For example, INTERVAL=DAY is considered to have a seasonal cycle of one week, and because there are 7 days in a week, the number of intervals in a seasonal cycle is 7. INTSHIFT( ‘interval’ )
returns the shift interval that applies to the shift index if a subperiod is specified. For example, YEAR intervals are shifted by MONTH, so INTSHIFT(‘YEAR’) returns ‘MONTH’. INTTEST( ‘interval’ )
returns 1 if the interval name is valid, 0 otherwise. For example, VALID = INTTEST(’MONTH’); should set VALID to 1, while VALID = INTTEST(’NOTANINTERVAL’); should set VALID to 0. The INTTEST function can be useful in verifying which values of multiplier n and the shift index s are valid in constructing an interval name. JULDATE( date )
returns the Julian date from a SAS date value. The format of the Julian date is either yyddd or yyyyddd depending on the value of the system option YEARCUTOFF=. For example, using the default system option values, JULDATE( ’31DEC1999’D ); returns 99365, while JULDATE( ’31DEC1899’D ); returns 1899365. MDY( month, day, year )
returns a SAS date value for month, day, and year values. MINUTE( datetime )
returns the minute from a SAS time or datetime value. MONTH( date )
returns the numerical value for the month of the year from a SAS date value. For example, MONTH=MONTH(’01JAN2000’D); returns 1, the numerical value for January. NWKDOM( n, weekday, month, year )
returns a SAS date value for the nth weekday of the month and year specified. For example, Thanksgiving is always the fourth (n=4) Thursday (weekday=5) in November (month=11). Thus THANKS2000 = NWKDOM( 4, 5, 11, 2000); returns the SAS date value for Thanksgiving in the year 2000. The last weekday of a month can be specified by using n=5. Memorial Day in the United States is the last (n=5) Monday (weekday=2) in May (month=5), and so MEMORIAL2002 = NWKDOM( 5, 2, 5, 2002); returns the SAS date value for Memorial Day in 2002. Because n D 5 always specifies the last occurrence of the month and most months have only 4 instances of each day, the result for n D 5 is often the same as the result for n D 4. NWKDOM is useful for calculating the SAS date values of holidays that are defined in this manner. QTR( date )
returns the quarter of the year from a SAS date value. SECOND( date )
returns the second from a SAS time or datetime value.
152 F Chapter 4: Date Intervals, Formats, and Functions
TIME()
returns the current time of day. TIMEPART( datetime )
returns the time part of a SAS datetime value. TODAY()
returns the current date as a SAS date value. (TODAY is another name for the DATE function.) WEEK( date < , ‘descriptor’ > )
returns the week of year from a SAS date value. The algorithm used to calculate the week depends on the descriptor, which can take the value ‘U’, ‘V’, or ‘W’. If the descriptor is ‘U,’ weeks start on Sunday and the range is 0 to 53. If weeks 0 and 53 exist, they are only partial weeks. Week 52 can be a partial week. If the descriptor is ‘V’, the result is equivalent to the ISO 8601 week of year definition. The range is 1 to 53. Week 53 is a leap week. The first week of the year, Week 1, and the last week of the year, Week 52 or 53, can include days in another Gregorian calendar year. If the descriptor is ‘W’, weeks start on Monday and the range is 0 to 53. If weeks 0 and 53 exist, they are only partial weeks. Week 52 can be a partial week. WEEKDAY( date )
returns the day of the week from a SAS date value. For example WEEKDAY=WEEKDAY(’17OCT1991’D); returns 5, the numerical value for Thursday. YEAR( date )
returns the year from a SAS date value. YYQ( year, quarter )
returns a SAS date value for year and quarter values.
References National Retail Federation (2007), National Retail Federation 4-5-4 Calendar, Washington, DC: NRF. Technical Committee ISO/TC 154, D. E., Processes, Documents in Commerce, I., and Administration (2004), ISO 8601:2004 Data Elements and Interchange Formats–Information Interchange– Representation of Dates and Times, 3rd Edition, Technical report, International Organization for Standardization.
Chapter 5
SAS Macros and Functions Contents SAS Macros . . . . . . . . . . . . . . . . . . . . BOXCOXAR Macro . . . . . . . . . . . . DFPVALUE Macro . . . . . . . . . . . . DFTEST Macro . . . . . . . . . . . . . . LOGTEST Macro . . . . . . . . . . . . . Functions . . . . . . . . . . . . . . . . . . . . . PROBDF Function for Dickey-Fuller Tests References . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . .
153 154 157 158 160 162 162 167
SAS Macros This chapter describes several SAS macros and the SAS function PROBDF that are provided with SAS/ETS software. A SAS macro is a program that generates SAS statements. Macros make it easy to produce and execute complex SAS programs that would be time-consuming to write yourself. SAS/ETS software includes the following macros: %AR
generates statements to define autoregressive error models for the MODEL procedure.
%BOXCOXAR
investigates Box-Cox transformations useful for modeling and forecasting a time series.
%DFPVALUE
computes probabilities for Dickey-Fuller test statistics.
%DFTEST
performs Dickey-Fuller tests for unit roots in a time series process.
%LOGTEST
tests to see if a log transformation is appropriate for modeling and forecasting a time series.
%MA
generates statements to define moving-average error models for the MODEL procedure.
%PDL
generates statements to define polynomial-distributed lag models for the MODEL procedure.
154 F Chapter 5: SAS Macros and Functions
These macros are part of the SAS AUTOCALL facility and are automatically available for use in your SAS program. See SAS Macro Language: Reference for information about the SAS macro facility. Since the %AR, %MA, and %PDL macros are used only with PROC MODEL, they are documented with the MODEL procedure. See the sections on the %AR, %MA, and %PDL macros in Chapter 18, “The MODEL Procedure,” for more information about these macros. The %BOXCOXAR, %DFPVALUE, %DFTEST, and %LOGTEST macros are described in the following sections.
BOXCOXAR Macro The %BOXCOXAR macro finds the optimal Box-Cox transformation for a time series. Transformations of the dependent variable are a useful way of dealing with nonlinear relationships or heteroscedasticity. For example, the logarithmic transformation is often used for modeling and forecasting time series that show exponential growth or that show variability proportional to the level of the series. The Box-Cox transformation is a general class of power transformations that include the log transformation and no transformation as special cases. The Box-Cox transformation is ( Yt D
.Xt Cc/ 1
for ¤ 0 ln.Xt C c/ for D 0
The parameter controls the shape of the transformation. For example, =0 produces a log transformation, while =0.5 results in a square root transformation. When =1, the transformed series differs from the original series by c 1. The constant c is optional. It can be used when some Xt values are negative or 0. You choose c so that the series Xt is always greater than c. The %BOXCOXAR macro tries a range of values and reports which of the values tried produces the optimal Box-Cox transformation. To evaluate different values, the %BOXCOXAR macro transforms the series with each value and fits an autoregressive model to the transformed series. It is assumed that this autoregressive model is a reasonably good approximation to the true time series model appropriate for the transformed series. The likelihood of the data under each autoregressive model is computed, and the value that produces the maximum likelihood over the values tried is reported as the optimal Box-Cox transformation for the series. The %BOXCOXAR macro prints and optionally writes to a SAS data set all of the values tried, the corresponding log-likelihood value, and related statistics for the autoregressive model. You can control the range and number of values tried. You can also control the order of the autoregressive models fit to the transformed series. You can difference the transformed series before the autoregressive model is fit.
BOXCOXAR Macro F 155
Note that the Box-Cox transformation might be appropriate when the data have a common distribution (apart from heteroscedasticity) but not when groups of observations for the variable are quite different. Thus the %BOXCOXAR macro is more often appropriate for time series data than for cross-sectional data.
Syntax The form of the %BOXCOXAR macro is %BOXCOXAR ( SAS-data-set, variable < , options > ) ;
The first argument, SAS-data-set, specifies the name of the SAS data set that contains the time series to be analyzed. The second argument, variable, specifies the time series variable name to be analyzed. The first two arguments are required. The following options can be used with the %BOXCOXAR macro. Options must follow the required arguments and are separated by commas. AR=n
specifies the order of the autoregressive model fit to the transformed series. The default is AR=5. CONST=value
specifies a constant c to be added to the series before transformation. Use the CONST= option when some values of the series are 0 or negative. The default is CONST=0. DIF=( differencing-list )
specifies the degrees of differencing to apply to the transformed series before the autoregressive model is fit. The differencing-list is a list of positive integers separated by commas and enclosed in parentheses. For example, DIF=(1,12) specifies that the transformed series be differenced once at lag 1 and once at lag 12. For more details, see the section “IDENTIFY Statement” on page 231 in Chapter 7, “The ARIMA Procedure.” LAMBDAHI=value
specifies the maximum value of lambda for the grid search. The default is LAMBDAHI=1. A large (in magnitude) LAMBDAHI= value can result in problems with floating point arithmetic. LAMBDALO=value
specifies the minimum value of lambda for the grid search. The default is LAMBDALO=0. A large (in magnitude) LAMBDALO= value can result in problems with floating point arithmetic. NLAMBDA=value
specifies the number of lambda values considered, including the LAMBDALO= and LAMBDAHI= option values. The default is NLAMBDA=2. OUT=SAS-data-set
writes the results to an output data set. The output data set includes the lambda values tried (LAMBDA), and for each lambda value, the log likelihood (LOGLIK), residual mean squared error (RMSE), Akaike Information Criterion (AIC), and Schwarz’s Bayesian Criterion (SBC).
156 F Chapter 5: SAS Macros and Functions
PRINT=YES | NO
specifies whether results are printed. The default is PRINT=YES. The printed output contains the lambda values, log likelihoods, residual mean square errors, Akaike Information Criterion (AIC), and Schwarz’s Bayesian Criterion (SBC).
Results The value of that produces the maximum log likelihood is returned in the macro variable &BOXCOXAR. The value of the variable &BOXCOXAR is “ERROR” if the %BOXCOXAR macro is unable to compute the best transformation due to errors. This might be the result of large lambda values. The Box-Cox transformation parameter involves exponentiation of the data, so that large lambda values can cause floating-point overflow. Results are printed unless the PRINT=NO option is specified. Results are also stored in SAS data sets when the OUT= option is specified.
Details Assume that the transformed series Yt is a stationary pth order autoregressive process generated by independent normally distributed innovations. .1
‚.B//.Yt
/ D t
t i id N.0; 2 / Given these assumptions, the log-likelihood function of the transformed data Yt is lY ./ D
n 1 n ln.2/ ln.j†j/ ln. 2 / 2 2 2 1 0 1 .Y 1/ † .Y 1/ 2 2
In this equation, n is the number of observations, is the mean of Yt , 1 is the n-dimensional column vector of 1s, 2 is the innovation variance, Y D .Y1 ; ; Yn /0 , and † is the covariance matrix of Y. The log-likelihood function of the original data X1 ; ; Xn is lX ./ D lY ./ C .
1/
n X
ln.Xt C c/
t D1
where c is the value of the CONST= option. For each value of , the maximum log-likelihood of the original data is obtained from the maximum log-likelihood of the transformed data given the maximum likelihood estimate of the autoregressive model. The maximum log-likelihood values are used to compute the Akaike Information Criterion (AIC) and Schwarz’s Bayesian Criterion (SBC) for each value. The residual mean squared error based on the
DFPVALUE Macro F 157
maximum likelihood estimator is also produced. To compute the mean squared error, the predicted values from the model are transformed again to the original scale (Pankratz 1983, pp. 256–258, and Taylor 1986). After differencing as specified by the DIF= option, the process is assumed to be a stationary autoregressive process. You can check for stationarity of the series with the %DFTEST macro. If the process is not stationary, differencing with the DIF= option is recommended. For a process with moving-average terms, a large value for the AR= option might be appropriate.
DFPVALUE Macro The %DFPVALUE macro computes the significance of the Dickey-Fuller test. The %DFPVALUE macro evaluates the p-value for the Dickey-Fuller test statistic for the test of H0 : “The time series has a unit root” versus Ha : “The time series is stationary” using tables published by Dickey (1976) and Dickey, Hasza, and Fuller (1984). The %DFPVALUE macro can compute p-values for tests of a simple unit root with lag 1 or for seasonal unit roots at lags 2, 4, or 12. The %DFPVALUE macro takes into account whether an intercept or deterministic time trend is assumed for the series. The %DFPVALUE macro is used by the %DFTEST macro described later in this chapter. Note that the %DFPVALUE macro has been superseded by the PROBDF function described later in this chapter. It remains for compatibility with past releases of SAS/ETS.
Syntax The %DFPVALUE macro has the following form: %DFPVALUE ( tau, nobs < , options > ) ;
The first argument, tau, specifies the value of the Dickey-Fuller test statistic. The second argument, nobs, specifies the number of observations on which the test statistic is based. The first two arguments are required. The following options can be used with the %DFPVALUE macro. Options must follow the required arguments and are separated by commas. DLAG=1 | 2 | 4 | 12
specifies the lag period of the unit root to be tested. DLAG=1 specifies a one-period unit root test. DLAG=2 specifies a test for a seasonal unit root with lag 2. DLAG=4 specifies a test for a seasonal unit root with lag 4. DLAG=12 specifies a test for a seasonal unit root with lag 12. The default is DLAG=1. TREND=0 | 1 | 2
specifies the degree of deterministic time trend included in the model. TREND=0 specifies no trend and assumes the series has a zero mean. TREND=1 includes an intercept term.
158 F Chapter 5: SAS Macros and Functions
TREND=2 specifies both an intercept and a deterministic linear time trend term. The default is TREND=1. TREND=2 is not allowed with DLAG=2, 4, or 12.
Results The computed p-value is returned in the macro variable &DFPVALUE. If the p-value is less than 0.01 or larger than 0.99, the macro variable &DFPVALUE is set to 0.01 or 0.99, respectively.
Minimum Observations The minimum number of observations required by the %DFPVALUE macro depends on the value of the DLAG= option. The minimum observations are as follows: DLAG= 1 2 4 12
Minimum Observations 9 6 4 12
DFTEST Macro The %DFTEST macro performs the Dickey-Fuller unit root test. You can use the %DFTEST macro to decide whether a time series is stationary and to determine the order of differencing required for the time series analysis of a nonstationary series. Most time series analysis methods require that the series to be analyzed is stationary. However, many economic time series are nonstationary processes. The usual approach to this problem is to difference the series. A time series that can be made stationary by differencing is said to have a unit root. For more information, see the discussion of this issue in the section “Getting Started: ARIMA Procedure” on page 195 of Chapter 7, “The ARIMA Procedure.” The Dickey-Fuller test is a method for testing whether a time series has a unit root. The %DFTEST macro tests the hypothesis H0 : “The time series has a unit root” versus Ha : “The time series is stationary” based on tables provided in Dickey (1976) and Dickey, Hasza, and Fuller (1984). The test can be applied for a simple unit root with lag 1, or for seasonal unit roots at lag 2, 4, or 12. Note that the %DFTEST macro has been superseded by the PROC ARIMA stationarity tests. See Chapter 7, “The ARIMA Procedure,” for details.
Syntax The %DFTEST macro has the following form: %DFTEST ( SAS-data-set, variable < , options > ) ;
DFTEST Macro F 159
The first argument, SAS-data-set, specifies the name of the SAS data set that contains the time series variable to be analyzed. The second argument, variable, specifies the time series variable name to be analyzed. The first two arguments are required. The following options can be used with the %DFTEST macro. Options must follow the required arguments and are separated by commas. AR=n
specifies the order of autoregressive model fit after any differencing specified by the DIF= and DLAG= options. The default is AR=3. DIF=( differencing-list )
specifies the degrees of differencing to be applied to the series. The differencing list is a list of positive integers separated by commas and enclosed in parentheses. For example, DIF=(1,12) specifies that the series be differenced once at lag 1 and once at lag 12. For more details, see the section “IDENTIFY Statement” on page 231 in Chapter 7, “The ARIMA Procedure.” If the option DIF=( d 1 , , d k ) is specified, the series analyzed is .1 B d1 / .1 B dk /Yt , where Yt is the variable specified, and B is the backshift operator defined by BYt D Yt 1 . DLAG=1 | 2 | 4 | 12
specifies the lag to be tested for a unit root. The default is DLAG=1. OUT=SAS-data-set
writes residuals to an output data set. OUTSTAT=SAS-data-set
writes the test statistic, parameter estimates, and other statistics to an output data set. TREND=0 | 1 | 2
specifies the degree of deterministic time trend included in the model. TREND=0 includes no deterministic term and assumes the series has a zero mean. TREND=1 includes an intercept term. TREND=2 specifies an intercept and a linear time trend term. The default is TREND=1. TREND=2 is not allowed with DLAG=2, 4, or 12.
Results The computed p-value is returned in the macro variable &DFTEST. If the p-value is less than 0.01 or larger than 0.99, the macro variable &DFTEST is set to 0.01 or 0.99, respectively. (The same value is given in the macro variable &DFPVALUE returned by the %DFPVALUE macro, which is used by the %DFTEST macro to compute the p-value.) Results can be stored in SAS data sets with the OUT= and OUTSTAT= options.
Minimum Observations The minimum number of observations required by the %DFTEST macro depends on the value of the DLAG= option. Let s be the sum of the differencing orders specified by the DIF= option, let t be the
160 F Chapter 5: SAS Macros and Functions
value of the TREND= option, and let p be the value of the AR= option. The minimum number of observations required is as follows: DLAG= 1 2 4 12
Minimum Observations 1 C p C s C max.9; p C t C 2/ 2 C p C s C max.6; p C t C 2/ 4 C p C s C max.4; p C t C 2/ 12 C p C s C max.12; p C t C 2/
Observations are not used if they have missing values for the series or for any lag or difference used in the autoregressive model.
LOGTEST Macro The %LOGTEST macro tests whether a logarithmic transformation is appropriate for modeling and forecasting a time series. The logarithmic transformation is often used for time series that show exponential growth or variability proportional to the level of the series. The %LOGTEST macro fits an autoregressive model to a series and fits the same model to the log of the series. Both models are estimated by the maximum-likelihood method, and the maximum log-likelihood values for both autoregressive models are computed. These log-likelihood values are then expressed in terms of the original data and compared. You can control the order of the autoregressive models. You can also difference the series and the log-transformed series before the autoregressive model is fit. You can print the log-likelihood values and related statistics (AIC, SBC, and MSE) for the autoregressive models for the series and the log-transformed series. You can also output these statistics to a SAS data set.
Syntax The %LOGTEST macro has the following form: %LOGTEST ( SAS-data-set, variable, < options > ) ;
The first argument, SAS-data-set, specifies the name of the SAS data set that contains the time series variable to be analyzed. The second argument, variable, specifies the time series variable name to be analyzed. The first two arguments are required. The following options can be used with the %LOGTEST macro. Options must follow the required arguments and are separated by commas. AR=n
specifies the order of the autoregressive model fit to the series and the log-transformed series. The default is AR=5.
LOGTEST Macro F 161
CONST=value
specifies a constant to be added to the series before transformation. Use the CONST= option when some values of the series are 0 or negative. The series analyzed must be greater than the negative of the CONST= value. The default is CONST=0. DIF=( differencing-list )
specifies the degrees of differencing applied to the original and log-transformed series before fitting the autoregressive model. The differencing-list is a list of positive integers separated by commas and enclosed in parentheses. For example, DIF=(1,12) specifies that the transformed series be differenced once at lag 1 and once at lag 12. For more details, see the section “IDENTIFY Statement” on page 231 in Chapter 7, “The ARIMA Procedure.” OUT=SAS-data-set
writes the results to an output data set. The output data set includes a variable TRANS that identifies the transformation (LOG or NONE), the log-likelihood value (LOGLIK), residual mean squared error (RMSE), Akaike Information Criterion (AIC), and Schwarz’s Bayesian Criterion (SBC) for the log-transformed and untransformed cases. PRINT=YES | NO
specifies whether the results are printed. The default is PRINT=NO. The printed output shows the log-likelihood value, residual mean squared error, Akaike Information Criterion (AIC), and Schwarz’s Bayesian Criterion (SBC) for the log-transformed and untransformed cases.
Results The result of the test is returned in the macro variable &LOGTEST. The value of the &LOGTEST variable is ‘LOG’ if the model fit to the log-transformed data has a larger log likelihood than the model fit to the untransformed series. The value of the &LOGTEST variable is ‘NONE’ if the model fit to the untransformed data has a larger log likelihood. The variable &LOGTEST is set to ‘ERROR’ if the %LOGTEST macro is unable to compute the test due to errors. Results are printed when the PRINT=YES option is specified. Results are stored in SAS data sets when the OUT= option is specified.
Details Assume that a time series Xt is a stationary pth order autoregressive process with normally distributed white noise innovations. That is, .1
‚.B//.Xt
x / D t
where x is the mean of Xt . The log likelihood function of Xt is l1 ./ D
n 1 ln.2/ ln.j†xx j/ 2 2 1 .X 1x /0 †xx1 .X 2e2
n ln.e2 / 2 1x /
162 F Chapter 5: SAS Macros and Functions
where n is the number of observations, 1 is the n-dimensional column vector of 1s, e2 is the variance of the white noise, X D .X1 ; ; Xn /0 , and †xx is the covariance matrix of X. On the other hand, if the log-transformed time series Yt D ln.Xt C c/ is a stationary pth order autoregressive process, the log-likelihood function of Xt is l0 ./ D
n 1 ln.2/ ln.j†yy j/ 2 2 1 .Y 1y /0 †yy1 .Y 2e2
n ln.e2 / 2 n X 1y / ln.Xt C c/ tD1
where y is the mean of Yt , Y D .Y1 ; ; Yn /0 , and †yy is the covariance matrix of Y. The %LOGTEST macro compares the maximum values of l1 ./ and l0 ./ and determines which is larger. The %LOGTEST macro also computes the Akaike Information Criterion (AIC), Schwarz’s Bayesian Criterion (SBC), and residual mean squared error based on the maximum likelihood estimator for the autoregressive model. For the mean squared error, retransformation of forecasts is based on Pankratz (1983, pp. 256–258). After differencing as specified by the DIF= option, the process is assumed to be a stationary autoregressive process. You might want to check for stationarity of the series using the %DFTEST macro. If the process is not stationary, differencing with the DIF= option is recommended. For a process with moving average terms, a large value for the AR= option might be appropriate.
Functions
PROBDF Function for Dickey-Fuller Tests The PROBDF function calculates significance probabilities for Dickey-Fuller tests for unit roots in time series. The PROBDF function can be used wherever SAS library functions can be used, including DATA step programs, SCL programs, and PROC MODEL programs.
Syntax PROBDF( x, n < , d < , type > > )
x
is the test statistic.
n
is the sample size. The minimum value of n allowed depends on the value specified for the third argument d. For d in the set (1,2,4,6,12), n must be an integer greater than or equal to max.2d; 5/; for other values of d the minimum value of n is 24.
PROBDF Function for Dickey-Fuller Tests F 163
d
is an optional integer giving the degree of the unit root tested for. Specify d D 1 for tests of a simple unit root .1 B/. Specify d equal to the seasonal cycle length for tests for a seasonal unit root .1 Bd /. The default value of d is 1; that is, a test for a simple unit root .1 B/ is assumed if d is not specified. The maximum value of d allowed is 12.
type
is an optional character argument that specifies the type of test statistic used. The values of type are the following: SZM
studentized test statistic for the zero mean (no intercept) case
RZM
regression test statistic for the zero mean (no intercept) case
SSM
studentized test statistic for the single mean (intercept) case
RSM
regression test statistic for the single mean (intercept) case
STR
studentized test statistic for the deterministic time trend case
RTR
regression test statistic for the deterministic time trend case
The values STR and RTR are allowed only when d D 1. The default value of type is SZM.
Details Theoretical Background
When a time series has a unit root, the series is nonstationary and the ordinary least squares (OLS) estimator is not normally distributed. Dickey (1976) and Dickey and Fuller (1979) studied the limiting distribution of the OLS estimator of autoregressive models for time series with a simple unit root. Dickey, Hasza, and Fuller (1984) obtained the limiting distribution for time series with seasonal unit roots. Consider the (p +1)th order autoregressive time series Yt D ˛1 Yt
C ˛2 Yt
1
2
C C ˛pC1 Yt
p 1
C et
and its characteristic equation mpC1
˛1 mp
˛2 mp
1
˛pC1 D 0
If all the characteristic roots are less than 1 in absolute value, Yt is stationary. Yt is nonstationary if there is a unit root. If there is a unit root, the sum of the autoregressive parameters is 1, and hence you can test for a unit root by testing whether the sum of the autoregressive parameters is 1 or not. The no-intercept model is parameterized as rYt D ıYt
1
where rYt D Yt
C 1 rYt
Yt
1
˛kC1
C C p rYt
and
ı D ˛1 C C ˛pC1 k D
1
1
˛pC1
p
C et
164 F Chapter 5: SAS Macros and Functions
The estimators are obtained by regressing rYt on Yt 1 ; rYt 1 ; ; rYt p . The t statistic of the ordinary least squares estimator of ı is the test statistic for the unit root test. If the type argument value specifies a test for a nonzero mean (intercept case), the autoregressive model includes a mean term ˛0 . If the type argument value specifies a test for a time trend, the model also includes a time trend term and the model is as follows: rYt D ˛0 C t C ıYt
1
C 1 rYt
1
C C p rYt
p
C et
For testing for a seasonal unit root, consider the multiplicative model .1
˛d B d /.1
Let r d Yt Yt
Yt
d.
1 B
p B p /Yt D et
The test statistic is calculated in the following steps:
1. Regress r d Yt on r d Yt 1 r d Yt p to obtain the initial estimators Oi and compute residuals eOt . Under the null hypothesis that ˛d D 1, Oi are consistent estimators of i . 2. Regress eOt on .1 O1 B ı D ˛d 1 and i Oi .
Op B p /Yt
d;r
dY
t 1,
, r d Yt
p
to obtain estimates of
The t ratio for the estimate of ı produced by the second step is used as a test statistic for testing for a seasonal unit root. The estimates of i are obtained by adding the estimates of i Oi from the second step to Oi from the first step. The series .1 B d /Yt is assumed to be stationary, where d is the value of the third argument to the PROBDF function. If the series is an ARMA process, a large value of p might be desirable in order to obtain a reliable test statistic. To determine an appropriate value for p; see Said and Dickey (1984). Test Statistics
The Dickey-Fuller test is used to test the null hypothesis that the time series exhibits a lag d unit root against the alternative of stationarity. The PROBDF function computes the probability of observing a test statistic more extreme than x under the assumption that the null hypothesis is true. You should reject the unit root hypothesis when PROBDF returns a small (significant) probability value. There are several different versions of the Dickey-Fuller test. The PROBDF function supports six versions, as selected by the type argument. Specify the type value that corresponds to the way that you calculated the test statistic x. The last two characters of the type value specify the kind of regression model used to compute the Dickey-Fuller test statistic. The meaning of the last two characters of the type value are as follows: ZM
zero mean or no-intercept case. The test statistic x is assumed to be computed from the regression model yt D ˛d yt
d
C et
PROBDF Function for Dickey-Fuller Tests F 165
SM
single mean or intercept case. The test statistic x is assumed to be computed from the regression model yt D ˛0 C ˛d yt
TR
d
C et
intercept and deterministic time trend case. The test statistic x is assumed to be computed from the regression model yt D ˛0 C t C ˛1 yt
1
C et
The first character of the type value specifies whether the regression test statistic or the studentized test statistic is used. Let ˛O d be the estimated regression coefficient for the dth lag of the series, and let se˛O be the standard error of ˛O d . The meaning of the first character of the type value is as follows: R
the regression-coefficient-based test statistic. The test statistic is x D n.˛O d
S
1/
the studentized test statistic. The test statistic is xD
.˛O d 1/ se˛O
See Dickey and Fuller (1979), Dickey, Hasza, and Fuller (1984), and Hamilton (1994) for more information about the Dickey-Fuller test null distribution. The preceding formulas are for the basic Dickey-Fuller test. The PROBDF function can also be used for the augmented Dickey-Fuller test, in which the error term et is modeled as an autoregressive process; however, the test statistic is computed somewhat differently for the augmented Dickey-Fuller test. See Dickey, Hasza, and Fuller (1984) and Hamilton (1994) for information about seasonal and nonseasonal augmented Dickey-Fuller tests. The PROBDF function is calculated from approximating functions fit to empirical quantiles that are produced by a Monte Carlo simulation that employs 108 replications for each simulation. Separate simulations were performed for selected values of n and for d D 1; 2; 4; 6; 12 (where n and d are the second and third arguments to the PROBDF function). The maximum error of the PROBDF function is approximately ˙10 3 for d in the set (1,2,4,6,12) and can be slightly larger for other d values. (Because the number of simulation replications used to produce the PROBDF function is much greater than the 60,000 replications used by Dickey and Fuller (1979) and Dickey, Hasza, and Fuller (1984), the PROBDF function can be expected to produce results that are substantially more accurate than the critical values reported in those papers.)
Examples Suppose the data set TEST contains 104 observations of the time series variable Y, and you want to test the null hypothesis that there exists a lag 4 seasonal unit root in the Y series. The following statements illustrate how to perform the single-mean Dickey-Fuller regression coefficient test using PROC REG and PROBDF.
166 F Chapter 5: SAS Macros and Functions
data test1; set test; y4 = lag4(y); run; proc reg data=test1 outest=alpha; model y = y4 / noprint; run; data _null_; set alpha; x = 100 * ( y4 - 1 ); p = probdf( x, 100, 4, "RSM" ); put p= pvalue5.3; run;
To perform the augmented Dickey-Fuller test, regress the differences of the series on lagged differences and on the lagged value of the series, and compute the test statistic from the regression coefficient for the lagged series. The following statements illustrate how to perform the single-mean augmented Dickey-Fuller studentized test for a simple unit root using PROC REG and PROBDF: data test1; set test; yl = lag(y); yd = dif(y); yd1 = lag1(yd); yd2 = lag2(yd); yd3 = lag3(yd); yd4 = lag4(yd); run; proc reg data=test1 outest=alpha covout; model yd = yl yd1-yd4 / noprint; run; data _null_; set alpha; retain a; if _type_ = 'PARMS' then a = yl - 1; if _type_ = 'COV' & _NAME_ = 'YL' then do; x = a / sqrt(yl); p = probdf( x, 99, 1, "SSM" ); put p= pvalue5.3; end; run;
The %DFTEST macro provides an easier way to perform Dickey-Fuller tests. The following statements perform the same tests as the preceding example: %dftest( test, y, ar=4 ); %put p=&dftest;
References F 167
References Dickey, D. A. (1976), “Estimation and Testing of Nonstationary Time Series,” Unpublished Ph.D. Thesis, Iowa State University, Ames. Dickey, D. A. and Fuller, W. A. (1979), “Distribution of the Estimation for Autoregressive Time Series with a Unit Root,” Journal of the American Statistical Association, 74, 427-431. Dickey, D. A., Hasza, D. P., and Fuller, W. A. (1984), “Testing for Unit Roots in Seasonal Time Series,” Journal of the American Statistical Association, 79, 355-367. Hamilton, J. D. (1994), Time Series Analysis, Princeton, NJ: Princeton University Press. Microsoft Excel 2000 Online Help, Redmond, WA: Microsoft Corp. Pankratz, A. (1983), Forecasting with Univariate Box-Jenkins Models: Concepts and Cases. New York: John Wiley. Said, S. E. and Dickey, D. A. (1984), “Testing for Unit Roots in ARMA Models of Unknown Order,” Biometrika, 71, 599-607. Taylor, J. M. G. (1986) “The Retransformed Mean After a Fitted Power Transformation,” Journal of the American Statistical Association, 81, 114-118.
168
Chapter 6
Nonlinear Optimization Methods Contents Overview . . . . . . . . . . . . . . . . . . Options . . . . . . . . . . . . . . . . . . . Details of Optimization Algorithms . . . . Overview . . . . . . . . . . . . . . . Choosing an Optimization Algorithm Algorithm Descriptions . . . . . . . Remote Monitoring . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . . .
169 169 179 179 180 181 185 187 188
Overview Several SAS/ETS procedures (COUNTREG, ENTROPY, MDC, QLIM, UCM, and VARMAX) use the nonlinear optimization (NLO) subsystem to perform nonlinear optimization. This chapter describes the options of the NLO system and some technical details of the available optimization methods. Note that not all options have been implemented for all procedures that use the NLO susbsystem. You should check each procedure chapter for more details about which options are available.
Options The following table summarizes the options available in the NLO system. Table 6.1
Option
NLO options
Description
Optimization Specifications TECHNIQUE= minimization technique UPDATE= update technique
170 F Chapter 6: Nonlinear Optimization Methods
Table 6.1
continued
Option
Description
LINESEARCH= LSPRECISION= HESCAL= INHESSIAN= RESTART=
line-search method line-search precision type of Hessian scaling start for approximated Hessian iteration number for update restart
Termination Criteria Specifications MAXFUNC= maximum number of function calls MAXITER= maximum number of iterations MINITER= minimum number of iterations MAXTIME= upper limit seconds of CPU time ABSCONV= absolute function convergence criterion ABSFCONV= absolute function convergence criterion ABSGCONV= absolute gradient convergence criterion ABSXCONV= absolute parameter convergence criterion FCONV= relative function convergence criterion FCONV2= relative function convergence criterion GCONV= relative gradient convergence criterion XCONV= relative parameter convergence criterion FSIZE= used in FCONV, GCONV criterion XSIZE= used in XCONV criterion Step Length Options DAMPSTEP= damped steps in line search MAXSTEP= maximum trust region radius INSTEP= initial trust region radius Printed Output Options PALL display (almost) all printed optimization-related output PHISTORY display optimization history PHISTPARMS display parameter estimates in each iteration PSHORT reduce some default optimization-related output PSUMMARY reduce most default optimization-related output NOPRINT suppress all printed optimization-related output Remote Monitoring Options SOCKET= specify the fileref for remote monitoring
These options are described in alphabetical order. ABSCONV=r ABSTOL=r
specifies an absolute function convergence criterion. For minimization, termination requires f . .k/ / r. The default value of r is the negative square root of the largest double-precision value, which serves only as a protection against overflows.
Options F 171
ABSFCONV=rŒn ABSFTOL=rŒn
specifies an absolute function convergence criterion. For all techniques except NMSIMP, termination requires a small change of the function value in successive iterations: jf . .k
1/
/
f . .k/ /j r
The same formula is used for the NMSIMP technique, but .k/ is defined as the vertex with the lowest function value, and .k 1/ is defined as the vertex with the highest function value in the simplex. The default value is r D 0. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can be terminated. ABSGCONV=rŒn ABSGTOL=rŒn
specifies an absolute gradient convergence criterion. Termination requires the maximum absolute gradient element to be small: max jgj . .k/ /j r j
This criterion is not used by the NMSIMP technique. The default value is r D 1E 5. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can be terminated. ABSXCONV=rŒn ABSXTOL=rŒn
specifies an absolute parameter convergence criterion. For all techniques except NMSIMP, termination requires a small Euclidean distance between successive parameter vectors, k .k/
.k
1/
k2 r
For the NMSIMP technique, termination requires either a small length ˛ .k/ of the vertices of a restart simplex, ˛ .k/ r or a small simplex size, ı .k/ r where the simplex size ı .k/ is defined as the L1 distance from the simplex vertex .k/ with the .k/ smallest function value to the other n simplex points l ¤ .k/ : ı .k/ D
X
.k/
k l
.k/ k1
l ¤y
The default is r D 1E 8 for the NMSIMP technique and r D 0 otherwise. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can terminate.
172 F Chapter 6: Nonlinear Optimization Methods
DAMPSTEP[=r ]
specifies that the initial step length value ˛ .0/ for each line search (used by the QUANEW, HYQUAN, CONGRA, or NEWRAP technique) cannot be larger than r times the step length value used in the former iteration. If the DAMPSTEP option is specified but r is not specified, the default is r D 2. The DAMPSTEP=r option can prevent the line-search algorithm from repeatedly stepping into regions where some objective functions are difficult to compute or where they could lead to floating point overflows during the computation of objective functions and their derivatives. The DAMPSTEP=r option can save time-costly function calls during the line searches of objective functions that result in very small steps. FCONV=rŒn FTOL=rŒn
specifies a relative function convergence criterion. For all techniques except NMSIMP, termination requires a small relative change of the function value in successive iterations, jf . .k/ / f . .k 1/ /j r max.jf . .k 1/ /j; FSIZE/ where FSIZE is defined by the FSIZE= option. The same formula is used for the NMSIMP technique, but .k/ is defined as the vertex with the lowest function value, and .k 1/ is defined as the vertex with the highest function value in the simplex. The default value may depend on the procedure. In most cases, you can use the PALL option to find it. FCONV2=rŒn FTOL2=rŒn
specifies another function convergence criterion. For all techniques except NMSIMP, termination requires a small predicted reduction df .k/ f . .k/ /
f . .k/ C s .k/ /
of the objective function. The predicted reduction df .k/ D D
g .k/T s .k/
1 .k/T .k/ .k/ H s s 2
1 .k/T .k/ s g 2
r is computed by approximating the objective function f by the first two terms of the Taylor series and substituting the Newton step s .k/ D
ŒH .k/
1 .k/
g
For the NMSIMP technique, termination requires a small deviation of the function r standard h i2 P .k/ .k/ 1 .k/ / values of the n C 1 simplex vertices l , l D 0; : : : ; n, nC1 f . / f . r l l P .k/ 1 .k/ , the where f . .k/ / D nC1 l f .l /. If there are nact boundary constraints active at mean and standard deviation are computed only for the n C 1 nact unconstrained vertices.
Options F 173
The default value is r D 1E 6 for the NMSIMP technique and r D 0 otherwise. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can terminate. FSIZE=r
specifies the FSIZE parameter of the relative function and relative gradient termination criteria. The default value is r D 0. For more details, see the FCONV= and GCONV= options. GCONV=rŒn GTOL=rŒn
specifies a relative gradient convergence criterion. For all techniques except CONGRA and NMSIMP, termination requires that the normalized predicted function reduction is small, f racg. .k/ /T ŒH .k/
1
g. .k/ /max.jf . .k/ /j; FSIZE/ r
where FSIZE is defined by the FSIZE= option. For the CONGRA technique (where a reliable Hessian estimate H is not available), the following criterion is used: k g. .k/ / k22 k s. .k/ / k2 r k g. .k/ / g. .k 1/ / k2 max.jf . .k/ /j; FSIZE/ This criterion is not used by the NMSIMP technique. The default value is r D 1E 8. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can terminate. HESCAL=0j1j2j3 HS=0j1j2j3
specifies the scaling version of the Hessian matrix used in NRRIDG, TRUREG, NEWRAP, or DBLDOG optimization. If HS is not equal to 0, the first iteration and each restart iteration sets the diagonal scaling .0/ matrix D .0/ D diag.di /: q .0/ .0/ di D max.jHi;i j; / .0/
where Hi;i are the diagonal elements of the Hessian. In every other iteration, the diagonal .0/
scaling matrix D .0/ D diag.di / is updated depending on the HS option: HS=0
specifies that no scaling is done.
HS=1
specifies the Moré (1978) scaling update: q .kC1/ .k/ .k/ di D max di ; max.jHi;i j; /
HS=2
specifies the Dennis, Gay, & Welsch (1981) scaling update: q .kC1/ .k/ .k/ di D max 0:6 di ; max.jHi;i j; /
HS=3
specifies that di is reset in each iteration: q .kC1/ .k/ di D max.jHi;i j; /
174 F Chapter 6: Nonlinear Optimization Methods
In each scaling update, is the relative machine precision. The default value is HS=0. Scaling of the Hessian can be time consuming in the case where general linear constraints are active. INHESSIAN[= r ] INHESS[= r ]
specifies how the initial estimate of the approximate Hessian is defined for the quasi-Newton techniques QUANEW and DBLDOG. There are two alternatives:
If you do not use the r specification, the initial estimate of the approximate Hessian is set to the Hessian at .0/ .
If you do use the r specification, the initial estimate of the approximate Hessian is set to the multiple of the identity matrix rI .
By default, if you do not specify the option INHESSIAN=r, the initial estimate of the approximate Hessian is set to the multiple of the identity matrix rI , where the scalar r is computed from the magnitude of the initial gradient. INSTEP=r
reduces the length of the first trial step during the line search of the first iterations. For highly nonlinear objective functions, such as the EXP function, the default initial radius of the trust-region algorithm TRUREG or DBLDOG or the default step length of the line-search algorithms can result in arithmetic overflows. If this occurs, you should specify decreasing values of 0 < r < 1 such as INSTEP=1E 1, INSTEP=1E 2, INSTEP=1E 4, and so on, until the iteration starts successfully.
For trust-region algorithms (TRUREG, DBLDOG), the INSTEP= option specifies a factor r > 0 for the initial radius .0/ of the trust region. The default initial trust-region radius is the length of the scaled gradient. This step corresponds to the default radius factor of r D 1.
For line-search algorithms (NEWRAP, CONGRA, QUANEW), the INSTEP= option specifies an upper bound for the initial step length for the line search during the first five iterations. The default initial step length is r D 1.
For the Nelder-Mead simplex algorithm, using TECH=NMSIMP, the INSTEP=r option defines the size of the start simplex.
LINESEARCH=i LIS=i
specifies the line-search method for the CONGRA, QUANEW, and NEWRAP optimization techniques. Refer to Fletcher (1987) for an introduction to line-search techniques. The value of i can be 1; : : : ; 8. For CONGRA, QUANEW and NEWRAP, the default value is i D 2. LIS=1
specifies a line-search method that needs the same number of function and gradient calls for cubic interpolation and cubic extrapolation; this method is similar to one used by the Harwell subroutine library.
LIS=2
specifies a line-search method that needs more function than gradient calls for quadratic and cubic interpolation and cubic extrapolation; this method is implemented as shown in Fletcher (1987) and can be modified to an exact line search by using the LSPRECISION= option.
Options F 175
LIS=3
specifies a line-search method that needs the same number of function and gradient calls for cubic interpolation and cubic extrapolation; this method is implemented as shown in Fletcher (1987) and can be modified to an exact line search by using the LSPRECISION= option.
LIS=4
specifies a line-search method that needs the same number of function and gradient calls for stepwise extrapolation and cubic interpolation.
LIS=5
specifies a line-search method that is a modified version of LIS=4.
LIS=6
specifies golden section line search (Polak 1971), which uses only function values for linear approximation.
LIS=7
specifies bisection line search (Polak 1971), which uses only function values for linear approximation.
LIS=8
specifies the Armijo line-search technique (Polak 1971), which uses only function values for linear approximation.
LSPRECISION=r LSP=r
specifies the degree of accuracy that should be obtained by the line-search algorithms LIS=2 and LIS=3. Usually an imprecise line search is inexpensive and successful. For more difficult optimization problems, a more precise and expensive line search may be necessary (Fletcher 1987). The second line-search method (which is the default for the NEWRAP, QUANEW, and CONGRA techniques) and the third line-search method approach exact line search for small LSPRECISION= values. If you have numerical problems, you should try to decrease the LSPRECISION= value to obtain a more precise line search. The default values are shown in the following table. Table 6.2
Line Search Precision Defaults
TECH=
UPDATE=
LSP default
QUANEW QUANEW CONGRA NEWRAP
DBFGS, BFGS DDFP, DFP all no update
r r r r
= 0.4 = 0.06 = 0.1 = 0.9
For more details, refer to Fletcher (1987). MAXFUNC=i MAXFU=i
specifies the maximum number i of function calls in the optimization process. The default values are
TRUREG, NRRIDG, NEWRAP: 125
QUANEW, DBLDOG: 500
CONGRA: 1000
NMSIMP: 3000
176 F Chapter 6: Nonlinear Optimization Methods
Note that the optimization can terminate only after completing a full iteration. Therefore, the number of function calls that is actually performed can exceed the number that is specified by the MAXFUNC= option. MAXITER=i MAXIT=i
specifies the maximum number i of iterations in the optimization process. The default values are
TRUREG, NRRIDG, NEWRAP: 50
QUANEW, DBLDOG: 200
CONGRA: 400
NMSIMP: 1000
These default values are also valid when i is specified as a missing value. MAXSTEP=rŒn
specifies an upper bound for the step length of the line-search algorithms during the first n iterations. By default, r is the largest double-precision value and n is the largest integer available. Setting this option can improve the speed of convergence for the CONGRA, QUANEW, and NEWRAP techniques. MAXTIME=r
specifies an upper limit of r seconds of CPU time for the optimization process. The default value is the largest floating-point double representation of your computer. Note that the time specified by the MAXTIME= option is checked only once at the end of each iteration. Therefore, the actual running time can be much longer than that specified by the MAXTIME= option. The actual running time includes the rest of the time needed to finish the iteration and the time needed to generate the output of the results. MINITER=i MINIT=i
specifies the minimum number of iterations. The default value is 0. If you request more iterations than are actually needed for convergence to a stationary point, the optimization algorithms can behave strangely. For example, the effect of rounding errors can prevent the algorithm from continuing for the required number of iterations. NOPRINT
suppresses the output. (See procedure documentation for availability of this option.) PALL
displays all optional output for optimization. (See procedure documentation for availability of this option.) PHISTORY
displays the optimization history. (See procedure documentation for availability of this option.)
Options F 177
PHISTPARMS
display parameter estimates in each iteration. (See procedure documentation for availability of this option.) PINIT
displays the initial values and derivatives (if available). (See procedure documentation for availability of this option.) PSHORT
restricts the amount of default output. (See procedure documentation for availability of this option.) PSUMMARY
restricts the amount of default displayed output to a short form of iteration history and notes, warnings, and errors. (See procedure documentation for availability of this option.) RESTART=i > 0 REST=i > 0
specifies that the QUANEW or CONGRA algorithm is restarted with a steepest descent/ascent search direction after, at most, i iterations. Default values are as follows:
CONGRA UPDATE=PB: restart is performed automatically, i is not used.
CONGRA UPDATE¤PB: i D min.10n; 80/, where n is the number of parameters.
QUANEW i is the largest integer available.
SOCKET=fileref
Specifies the fileref that contains the information needed for remote monitoring. See the section “Remote Monitoring” on page 185 for more details. TECHNIQUE=value TECH=value
specifies the optimization technique. Valid values are as follows:
CONGRA performs a conjugate-gradient optimization, which can be more precisely specified with the UPDATE= option and modified with the LINESEARCH= option. When you specify this option, UPDATE=PB by default.
DBLDOG performs a version of double-dogleg optimization, which can be more precisely specified with the UPDATE= option. When you specify this option, UPDATE=DBFGS by default.
NMSIMP performs a Nelder-Mead simplex optimization.
NONE does not perform any optimization. This option can be used as follows:
178 F Chapter 6: Nonlinear Optimization Methods
– to perform a grid search without optimization – to compute estimates and predictions that cannot be obtained efficiently with any of the optimization techniques
NEWRAP performs a Newton-Raphson optimization that combines a line-search algorithm with ridging. The line-search algorithm LIS=2 is the default method.
NRRIDG performs a Newton-Raphson optimization with ridging.
QUANEW performs a quasi-Newton optimization, which can be defined more precisely with the UPDATE= option and modified with the LINESEARCH= option. This is the default estimation method.
TRUREG performs a trust region optimization.
UPDATE=method UPD=method
specifies the update method for the QUANEW, DBLDOG, or CONGRA optimization technique. Not every update method can be used with each optimizer. Valid methods are as follows:
BFGS performs the original Broyden, Fletcher, Goldfarb, and Shanno (BFGS) update of the inverse Hessian matrix.
DBFGS performs the dual BFGS update of the Cholesky factor of the Hessian matrix. This is the default update method.
DDFP performs the dual Davidon, Fletcher, and Powell (DFP) update of the Cholesky factor of the Hessian matrix.
DFP performs the original DFP update of the inverse Hessian matrix.
PB performs the automatic restart update method of Powell (1977) and Beale (1972).
FR performs the Fletcher-Reeves update (Fletcher 1987).
PR performs the Polak-Ribiere update (Fletcher 1987).
CD performs a conjugate-descent update of Fletcher (1987).
Details of Optimization Algorithms F 179
XCONV=rŒn XTOL=rŒn
specifies the relative parameter convergence criterion. For all techniques except NMSIMP, termination requires a small relative parameter change in subsequent iterations. .k/
maxj jj
.k 1/
j
j
.k/ .k 1/ max.jj j; jj j; XSIZE/
r .k/
For the NMSIMP technique, the same formula is used, but j
is defined as the vertex with
.k 1/ j
the lowest function value and is defined as the vertex with the highest function value in the simplex. The default value is r D 1E 8 for the NMSIMP technique and r D 0 otherwise. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can be terminated. XSIZE=r > 0
specifies the XSIZE parameter of the relative parameter termination criterion. The default value is r D 0. For more detail, see the XCONV= option.
Details of Optimization Algorithms
Overview There are several optimization techniques available. You can choose a particular optimizer with the TECH=name option in the PROC statement or NLOPTIONS statement. Table 6.3
Optimization Techniques
Algorithm
TECH=
trust region Method Newton-Raphson method with line search Newton-Raphson method with ridging quasi-Newton methods (DBFGS, DDFP, BFGS, DFP) double-dogleg method (DBFGS, DDFP) conjugate gradient methods (PB, FR, PR, CD) Nelder-Mead simplex method
TRUREG NEWRAP NRRIDG QUANEW DBLDOG CONGRA NMSIMP
No algorithm for optimizing general nonlinear functions exists that always finds the global optimum for a general nonlinear minimization problem in a reasonable amount of time. Since no single optimization technique is invariably superior to others, NLO provides a variety of optimization techniques that work well in various circumstances. However, you can devise problems for which none of the techniques in NLO can find the correct solution. Moreover, nonlinear optimization can
180 F Chapter 6: Nonlinear Optimization Methods
be computationally expensive in terms of time and memory, so you must be careful when matching an algorithm to a problem. All optimization techniques in NLO use O.n2 / memory except the conjugate gradient methods, which use only O.n/ of memory and are designed to optimize problems with many parameters. These iterative techniques require repeated computation of the following: the function value (optimization criterion) the gradient vector (first-order partial derivatives) for some techniques, the (approximate) Hessian matrix (second-order partial derivatives) However, since each of the optimizers requires different derivatives, some computational efficiencies can be gained. Table 6.4 shows, for each optimization technique, which derivatives are required. (FOD means that first-order derivatives or the gradient is computed; SOD means that second-order derivatives or the Hessian is computed.) Table 6.4
Optimization Computations
Algorithm
FOD
SOD
TRUREG NEWRAP NRRIDG QUANEW DBLDOG CONGRA NMSIMP
x x x x x x -
x x x -
Each optimization method employs one or more convergence criteria that determine when it has converged. The various termination criteria are listed and described in the previous section. An algorithm is considered to have converged when any one of the convergence criterion is satisfied. For example, under the default settings, the QUANEW algorithm will converge if ABSGCON V < 1E 5, F CON V < 10 FDIGI T S , or GCON V < 1E 8.
Choosing an Optimization Algorithm The factors that go into choosing a particular optimization technique for a particular problem are complex and might involve trial and error. For many optimization problems, computing the gradient takes more computer time than computing the function value, and computing the Hessian sometimes takes much more computer time and memory than computing the gradient, especially when there are many decision variables. Unfortunately, optimization techniques that do not use some kind of Hessian approximation usually require many more iterations than techniques that do use a Hessian matrix, and as a result the total run time of
Algorithm Descriptions F 181
these techniques is often longer. Techniques that do not use the Hessian also tend to be less reliable. For example, they can more easily terminate at stationary points rather than at global optima. A few general remarks about the various optimization techniques follow. The second-derivative methods TRUREG, NEWRAP, and NRRIDG are best for small problems where the Hessian matrix is not expensive to compute. Sometimes the NRRIDG algorithm can be faster than the TRUREG algorithm, but TRUREG can be more stable. The NRRIDG algorithm requires only one matrix with n.n C 1/=2 double words; TRUREG and NEWRAP require two such matrices. The first-derivative methods QUANEW and DBLDOG are best for medium-sized problems where the objective function and the gradient are much faster to evaluate than the Hessian. The QUANEW and DBLDOG algorithms, in general, require more iterations than TRUREG, NRRIDG, and NEWRAP, but each iteration can be much faster. The QUANEW and DBLDOG algorithms require only the gradient to update an approximate Hessian, and they require slightly less memory than TRUREG or NEWRAP (essentially one matrix with n.n C 1/=2 double words). QUANEW is the default optimization method. The first-derivative method CONGRA is best for large problems where the objective function and the gradient can be computed much faster than the Hessian and where too much memory is required to store the (approximate) Hessian. The CONGRA algorithm, in general, requires more iterations than QUANEW or DBLDOG, but each iteration can be much faster. Since CONGRA requires only a factor of n double-word memory, many large applications can be solved only by CONGRA. The no-derivative method NMSIMP is best for small problems where derivatives are not continuous or are very difficult to compute.
Algorithm Descriptions Some details about the optimization techniques are as follows.
Trust Region Optimization (TRUREG)
The trust region method uses the gradient g..k/ / and the Hessian matrix H..k/ /; thus, it requires that the objective function f . / have continuous first- and second-order derivatives inside the feasible region. The trust region method iteratively optimizes a quadratic approximation to the nonlinear objective function within a hyperelliptic trust region with radius that constrains the step size that corresponds to the quality of the quadratic approximation. The trust region method is implemented using Dennis, Gay, and Welsch (1981), Gay (1983), and Moré and Sorensen (1983). The trust region method performs well for small- to medium-sized problems, and it does not need many function, gradient, and Hessian calls. However, if the computation of the Hessian matrix is
182 F Chapter 6: Nonlinear Optimization Methods
computationally expensive, one of the (dual) quasi-Newton or conjugate gradient algorithms may be more efficient.
Newton-Raphson Optimization with Line Search (NEWRAP)
The NEWRAP technique uses the gradient g..k/ / and the Hessian matrix H..k/ /; thus, it requires that the objective function have continuous first- and second-order derivatives inside the feasible region. If second-order derivatives are computed efficiently and precisely, the NEWRAP method can perform well for medium-sized to large problems, and it does not need many function, gradient, and Hessian calls. This algorithm uses a pure Newton step when the Hessian is positive definite and when the Newton step reduces the value of the objective function successfully. Otherwise, a combination of ridging and line search is performed to compute successful steps. If the Hessian is not positive definite, a multiple of the identity matrix is added to the Hessian matrix to make it positive definite. In each iteration, a line search is performed along the search direction to find an approximate optimum of the objective function. The default line-search method uses quadratic interpolation and cubic extrapolation (LIS=2).
Newton-Raphson Ridge Optimization (NRRIDG)
The NRRIDG technique uses the gradient g..k/ / and the Hessian matrix H..k/ /; thus, it requires that the objective function have continuous first- and second-order derivatives inside the feasible region. This algorithm uses a pure Newton step when the Hessian is positive definite and when the Newton step reduces the value of the objective function successfully. If at least one of these two conditions is not satisfied, a multiple of the identity matrix is added to the Hessian matrix. The NRRIDG method performs well for small- to medium-sized problems, and it does not require many function, gradient, and Hessian calls. However, if the computation of the Hessian matrix is computationally expensive, one of the (dual) quasi-Newton or conjugate gradient algorithms might be more efficient. Since the NRRIDG technique uses an orthogonal decomposition of the approximate Hessian, each iteration of NRRIDG can be slower than that of the NEWRAP technique, which works with Cholesky decomposition. Usually, however, NRRIDG requires fewer iterations than NEWRAP.
Quasi-Newton Optimization (QUANEW)
The (dual) quasi-Newton method uses the gradient g..k/ /, and it does not need to compute secondorder derivatives since they are approximated. It works well for medium to moderately large optimization problems where the objective function and the gradient are much faster to compute than the Hessian; but, in general, it requires more iterations than the TRUREG, NEWRAP, and NRRIDG techniques, which compute second-order derivatives. QUANEW is the default optimization algorithm because it provides an appropriate balance between the speed and stability required for most nonlinear mixed model applications.
Algorithm Descriptions F 183
The QUANEW technique is one of the following, depending upon the value of the UPDATE= option. the original quasi-Newton algorithm, which updates an approximation of the inverse Hessian the dual quasi-Newton algorithm, which updates the Cholesky factor of an approximate Hessian (default) You can specify four update formulas with the UPDATE= option: DBFGS performs the dual Broyden, Fletcher, Goldfarb, and Shanno (BFGS) update of the Cholesky factor of the Hessian matrix. This is the default. DDFP performs the dual Davidon, Fletcher, and Powell (DFP) update of the Cholesky factor of the Hessian matrix. BFGS performs the original BFGS update of the inverse Hessian matrix. DFP performs the original DFP update of the inverse Hessian matrix. In each iteration, a line search is performed along the search direction to find an approximate optimum. The default line-search method uses quadratic interpolation and cubic extrapolation to obtain a step size ˛ satisfying the Goldstein conditions. One of the Goldstein conditions can be violated if the feasible region defines an upper limit of the step size. Violating the left-side Goldstein condition can affect the positive definiteness of the quasi-Newton update. In that case, either the update is skipped or the iterations are restarted with an identity matrix, resulting in the steepest descent or ascent search direction. You can specify line-search algorithms other than the default with the LIS= option. The QUANEW algorithm performs its own line-search technique. All options and parameters (except the INSTEP= option) that control the line search in the other algorithms do not apply here. In several applications, large steps in the first iterations are troublesome. You can use the INSTEP= option to impose an upper bound for the step size ˛ during the first five iterations. You can also use the INHESSIAN[=r] option to specify a different starting approximation for the Hessian. If you specify only the INHESSIAN option, the Cholesky factor of a (possibly ridged) finite difference approximation of the Hessian is used to initialize the quasi-Newton update process. The values of the LCSINGULAR=, LCEPSILON=, and LCDEACT= options, which control the processing of linear and boundary constraints, are valid only for the quadratic programming subroutine used in each iteration of the QUANEW algorithm.
Double-Dogleg Optimization (DBLDOG)
The double-dogleg optimization method combines the ideas of the quasi-Newton and trust region methods. In each iteration, the double-dogleg algorithm computes the step s .k/ as the linear .k/ combination of the steepest descent or ascent search direction s1 and a quasi-Newton search .k/ direction s2 . .k/
.k/
s .k/ D ˛1 s1 C ˛2 s2
184 F Chapter 6: Nonlinear Optimization Methods
The step is requested to remain within a prespecified trust region radius; see Fletcher (1987, p. 107). Thus, the DBLDOG subroutine uses the dual quasi-Newton update but does not perform a line search. You can specify two update formulas with the UPDATE= option: DBFGS performs the dual Broyden, Fletcher, Goldfarb, and Shanno update of the Cholesky factor of the Hessian matrix. This is the default. DDFP performs the dual Davidon, Fletcher, and Powell update of the Cholesky factor of the Hessian matrix. The double-dogleg optimization technique works well for medium to moderately large optimization problems where the objective function and the gradient are much faster to compute than the Hessian. The implementation is based on Dennis and Mei (1979) and Gay (1983), but it is extended for dealing with boundary and linear constraints. The DBLDOG technique generally requires more iterations than the TRUREG, NEWRAP, or NRRIDG technique, which requires second-order derivatives; however, each of the DBLDOG iterations is computationally cheap. Furthermore, the DBLDOG technique requires only gradient calls for the update of the Cholesky factor of an approximate Hessian. Conjugate Gradient Optimization (CONGRA)
Second-order derivatives are not required by the CONGRA algorithm and are not even approximated. The CONGRA algorithm can be expensive in function and gradient calls, but it requires only O.n/ memory for unconstrained optimization. In general, many iterations are required to obtain a precise solution, but each of the CONGRA iterations is computationally cheap. You can specify four different update formulas for generating the conjugate directions by using the UPDATE= option: PB performs the automatic restart update method of Powell (1977) and Beale (1972). This is the default. FR performs the Fletcher-Reeves update (Fletcher 1987). PR performs the Polak-Ribiere update (Fletcher 1987). CD performs a conjugate-descent update of Fletcher (1987). The default, UPDATE=PB, behaved best in most test examples. You are advised to avoid the option UPDATE=CD, which behaved worst in most test examples. The CONGRA subroutine should be used for optimization problems with large n. For the unconstrained or boundary constrained case, CONGRA requires only O.n/ bytes of working memory, whereas all other optimization methods require order O.n2 / bytes of working memory. During n successive iterations, uninterrupted by restarts or changes in the working set, the conjugate gradient algorithm computes a cycle of n conjugate search directions. In each iteration, a line search is performed along the search direction to find an approximate optimum of the objective function. The default line-search method uses quadratic interpolation and cubic extrapolation to obtain a step size ˛ satisfying the Goldstein conditions. One of the Goldstein conditions can be violated if the feasible region defines an upper limit for the step size. Other line-search algorithms can be specified with the LIS= option.
Remote Monitoring F 185
Nelder-Mead Simplex Optimization (NMSIMP)
The Nelder-Mead simplex method does not use any derivatives and does not assume that the objective function has continuous derivatives. The objective function itself needs to be continuous. This technique is quite expensive in the number of function calls, and it might be unable to generate precise results for n much greater than 40. The original Nelder-Mead simplex algorithm is implemented and extended to boundary constraints. This algorithm does not compute the objective for infeasible points, but it changes the shape of the simplex by adapting to the nonlinearities of the objective function, which contributes to an increased speed of convergence. It uses a special termination criteria.
Remote Monitoring The SAS/EmMonitor is an application for Windows that enables you to monitor and stop from your PC a CPU-intensive application performed by the NLO subsystem that runs on a remote server. On the server side, a FILENAME statement assigns a fileref to a SOCKET-type device that defines the IP address of the client and the port number for listening. The fileref is then specified in the SOCKET= option in the PROC statement to control the EmMonitor. The following statements show an example of server-side statements for PROC ENTROPY. data one; do t = 1 to 10; x1 = 5 * ranuni(456); x2 = 10 * ranuni( 456); x3 = 2 * rannor(1456); e1 = rannor(1456); e2 = rannor(4560); tmp1 = 0.5 * e1 - 0.1 * e2; tmp2 = -0.1 * e1 - 0.3 * e2; y1 = 7 + 8.5*x1 + 2*x2 + tmp1; y2 = -3 + -2*x1 + x2 + 3*x3 + tmp2; output; end; run; filename sock socket 'your.pc.address.com:6943'; proc entropy data=one tech=tr gmenm gconv=2.e-5 socket=sock; model y1 = x1 x2 x3; run;
On the client side, the EmMonitor application is started with the following syntax: EmMonitor options
The options are:
186 F Chapter 6: Nonlinear Optimization Methods
-p port_number
defines the port number
-t title
defines the title of the EmMonitor window
-k
keeps the monitor alive when the iteration is completed
The default port number is 6943. The server does not need to be running when you start the EmMonitor, and you can start or dismiss the server at any time during the iteration process. You only need to remember the port number. Starting the PC client, or closing it prematurely, does not have any effect on the server side. In other words, the iteration process continues until one of the criteria for termination is met. Figure 6.1 through Figure 6.4 show screenshots of the application on the client side. Figure 6.1 Graph Tab Group 0
Figure 6.2 Graph Tab Group 1
ODS Table Names F 187
Figure 6.3 Status Tab
Figure 6.4 Options Tab
ODS Table Names The NLO subsystem assigns a name to each table it creates. You can use these names when using the Output Delivery System (ODS) to select tables and create output data sets. Not all tables are created by all SAS/ETS procedures that use the NLO subsystem. You should check the procedure chapter for more details. The names are listed in the following table.
188 F Chapter 6: Nonlinear Optimization Methods
Table 6.5
ODS Tables Produced by the NLO Subsystem
ODS Table Name
Description
ConvergenceStatus InputOptions IterHist IterStart IterStop Lagrange LinCon LinConDel LinConSol ParameterEstimatesResults ParameterEstimatesStart ProblemDescription ProjGrad
Convergence status Input options Iteration history Iteration start Iteration stop Lagrange multipliers at the solution Linear constraints Deleted linear constraints Linear constraints at the solution Estimates at the results Estimates at the start of the iterations Problem description Projected gradients
References Beale, E.M.L. (1972), “A Derivation of Conjugate Gradients,” in Numerical Methods for Nonlinear Optimization, ed. F.A. Lootsma, London: Academic Press. Dennis, J.E., Gay, D.M., and Welsch, R.E. (1981), “An Adaptive Nonlinear Least-Squares Algorithm,” ACM Transactions on Mathematical Software, 7, 348–368. Dennis, J.E. and Mei, H.H.W. (1979), “Two New Unconstrained Optimization Algorithms Which Use Function and Gradient Values,” J. Optim. Theory Appl., 28, 453–482. Dennis, J.E. and Schnabel, R.B. (1983), Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Englewood, NJ: Prentice-Hall. Fletcher, R. (1987), Practical Methods of Optimization, Second Edition, Chichester: John Wiley & Sons, Inc. Gay, D.M. (1983), “Subroutines for Unconstrained Minimization,” ACM Transactions on Mathematical Software, 9, 503–524. Moré, J.J. (1978), “The Levenberg-Marquardt Algorithm: Implementation and Theory,” in Lecture Notes in Mathematics 630, ed. G.A. Watson, Berlin-Heidelberg-New York: Springer Verlag. Moré, J.J. and Sorensen, D.C. (1983), “Computing a Trust-region Step,” SIAM Journal on Scientific and Statistical Computing, 4, 553–572. Polak, E. (1971), Computational Methods in Optimization, New York: Academic Press.
References F 189
Powell, J.M.D. (1977), “Restart Procedures for the Conjugate Gradient Method,” Math. Prog., 12, 241–254.
190
Part II
Procedure Reference
192
Chapter 7
The ARIMA Procedure Contents Overview: ARIMA Procedure . . . . . . . . . . . . . . . . . Getting Started: ARIMA Procedure . . . . . . . . . . . . . . The Three Stages of ARIMA Modeling . . . . . . . . . Identification Stage . . . . . . . . . . . . . . . . . . . . Estimation and Diagnostic Checking Stage . . . . . . . Forecasting Stage . . . . . . . . . . . . . . . . . . . . Using ARIMA Procedure Statements . . . . . . . . . . General Notation for ARIMA Models . . . . . . . . . . Stationarity . . . . . . . . . . . . . . . . . . . . . . . . Differencing . . . . . . . . . . . . . . . . . . . . . . . Subset, Seasonal, and Factored ARMA Models . . . . . Input Variables and Regression with ARMA Errors . . . Intervention Models and Interrupted Time Series . . . . Rational Transfer Functions and Distributed Lag Models Forecasting with Input Variables . . . . . . . . . . . . . Data Requirements . . . . . . . . . . . . . . . . . . . . Syntax: ARIMA Procedure . . . . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . . . PROC ARIMA Statement . . . . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . . . IDENTIFY Statement . . . . . . . . . . . . . . . . . . ESTIMATE Statement . . . . . . . . . . . . . . . . . . OUTLIER Statement . . . . . . . . . . . . . . . . . . . FORECAST Statement . . . . . . . . . . . . . . . . . Details: ARIMA Procedure . . . . . . . . . . . . . . . . . . . The Inverse Autocorrelation Function . . . . . . . . . . The Partial Autocorrelation Function . . . . . . . . . . The Cross-Correlation Function . . . . . . . . . . . . . The ESACF Method . . . . . . . . . . . . . . . . . . . The MINIC Method . . . . . . . . . . . . . . . . . . . The SCAN Method . . . . . . . . . . . . . . . . . . . . Stationarity Tests . . . . . . . . . . . . . . . . . . . . . Prewhitening . . . . . . . . . . . . . . . . . . . . . . . Identifying Transfer Function Models . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
.
. . .
.
.
194 195 195 196 201 207 209 210 213 213 215 216 219 221 223 224 224 225 227 231 231 235 240 241 243 243 244 244 245 246 248 250 250 251
194 F Chapter 7: The ARIMA Procedure
Missing Values and Autocorrelations . . . . . . . . . . . . . . . Estimation Details . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Inputs and Transfer Functions . . . . . . . . . . . . . Initial Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stationarity and Invertibility . . . . . . . . . . . . . . . . . . . . Naming of Model Parameters . . . . . . . . . . . . . . . . . . . Missing Values and Estimation and Forecasting . . . . . . . . . . Forecasting Details . . . . . . . . . . . . . . . . . . . . . . . . . Forecasting Log Transformed Data . . . . . . . . . . . . . . . . Specifying Series Periodicity . . . . . . . . . . . . . . . . . . . Detecting Outliers . . . . . . . . . . . . . . . . . . . . . . . . . OUT= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . OUTCOV= Data Set . . . . . . . . . . . . . . . . . . . . . . . . OUTEST= Data Set . . . . . . . . . . . . . . . . . . . . . . . . OUTMODEL= SAS Data Set . . . . . . . . . . . . . . . . . . . OUTSTAT= Data Set . . . . . . . . . . . . . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . Statistical Graphics . . . . . . . . . . . . . . . . . . . . . . . . . Examples: ARIMA Procedure . . . . . . . . . . . . . . . . . . . . . . Example 7.1: Simulated IMA Model . . . . . . . . . . . . . . . Example 7.2: Seasonal Model for the Airline Series . . . . . . . Example 7.3: Model for Series J Data from Box and Jenkins . . Example 7.4: An Intervention Model for Ozone Data . . . . . . Example 7.5: Using Diagnostics to Identify ARIMA Models . . Example 7.6: Detection of Level Changes in the Nile River Data Example 7.7: Iterative Outlier Detection . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. .
.
.
251 252 256 258 259 259 260 260 262 263 263 265 267 267 270 272 273 275 277 280 280 285 292 301 303 308 310 313
Overview: ARIMA Procedure The ARIMA procedure analyzes and forecasts equally spaced univariate time series data, transfer function data, and intervention data by using the autoregressive integrated moving-average (ARIMA) or autoregressive moving-average (ARMA) model. An ARIMA model predicts a value in a response time series as a linear combination of its own past values, past errors (also called shocks or innovations), and current and past values of other time series. The ARIMA approach was first popularized by Box and Jenkins, and ARIMA models are often referred to as Box-Jenkins models. The general transfer function model employed by the ARIMA procedure was discussed by Box and Tiao (1975). When an ARIMA model includes other time series as input variables, the model is sometimes referred to as an ARIMAX model. Pankratz (1991) refers to the ARIMAX model as dynamic regression.
Getting Started: ARIMA Procedure F 195
The ARIMA procedure provides a comprehensive set of tools for univariate time series model identification, parameter estimation, and forecasting, and it offers great flexibility in the kinds of ARIMA or ARIMAX models that can be analyzed. The ARIMA procedure supports seasonal, subset, and factored ARIMA models; intervention or interrupted time series models; multiple regression analysis with ARMA errors; and rational transfer function models of any complexity. The design of PROC ARIMA closely follows the Box-Jenkins strategy for time series modeling with features for the identification, estimation and diagnostic checking, and forecasting steps of the Box-Jenkins method. Before you use PROC ARIMA, you should be familiar with Box-Jenkins methods, and you should exercise care and judgment when you use the ARIMA procedure. The ARIMA class of time series models is complex and powerful, and some degree of expertise is needed to use them correctly.
Getting Started: ARIMA Procedure This section outlines the use of the ARIMA procedure and gives a cursory description of the ARIMA modeling process for readers who are less familiar with these methods.
The Three Stages of ARIMA Modeling The analysis performed by PROC ARIMA is divided into three stages, corresponding to the stages described by Box and Jenkins (1976). 1. In the identification stage, you use the IDENTIFY statement to specify the response series and identify candidate ARIMA models for it. The IDENTIFY statement reads time series that are to be used in later statements, possibly differencing them, and computes autocorrelations, inverse autocorrelations, partial autocorrelations, and cross-correlations. Stationarity tests can be performed to determine if differencing is necessary. The analysis of the IDENTIFY statement output usually suggests one or more ARIMA models that could be fit. Options enable you to test for stationarity and tentative ARMA order identification. 2. In the estimation and diagnostic checking stage, you use the ESTIMATE statement to specify the ARIMA model to fit to the variable specified in the previous IDENTIFY statement and to estimate the parameters of that model. The ESTIMATE statement also produces diagnostic statistics to help you judge the adequacy of the model. Significance tests for parameter estimates indicate whether some terms in the model might be unnecessary. Goodness-of-fit statistics aid in comparing this model to others. Tests for white noise residuals indicate whether the residual series contains additional information that might be used by a more complex model. The OUTLIER statement provides another useful tool to check whether the currently estimated model accounts for all the variation in the series. If the diagnostic tests indicate problems with the model, you try another model and then repeat the estimation and diagnostic checking stage.
196 F Chapter 7: The ARIMA Procedure
3. In the forecasting stage, you use the FORECAST statement to forecast future values of the time series and to generate confidence intervals for these forecasts from the ARIMA model produced by the preceding ESTIMATE statement. These three steps are explained further and illustrated through an extended example in the following sections.
Identification Stage Suppose you have a variable called SALES that you want to forecast. The following example illustrates ARIMA modeling and forecasting by using a simulated data set TEST that contains a time series SALES generated by an ARIMA(1,1,1) model. The output produced by this example is explained in the following sections. The simulated SALES series is shown in Figure 7.1. ods graphics on; proc sgplot data=test; scatter y=sales x=date; run;
Figure 7.1 Simulated ARIMA(1,1,1) Series SALES
Identification Stage F 197
Using the IDENTIFY Statement You first specify the input data set in the PROC ARIMA statement. Then, you use an IDENTIFY statement to read in the SALES series and analyze its correlation properties. You do this by using the following statements: proc arima data=test ; identify var=sales nlag=24; run;
Descriptive Statistics
The IDENTIFY statement first prints descriptive statistics for the SALES series. This part of the IDENTIFY statement output is shown in Figure 7.2. Figure 7.2 IDENTIFY Statement Descriptive Statistics Output The ARIMA Procedure Name of Variable = sales Mean of Working Series Standard Deviation Number of Observations
137.3662 17.36385 100
Autocorrelation Function Plots
The IDENTIFY statement next produces a panel of plots used for its autocorrelation and trend analysis. The panel contains the following plots: the time series plot of the series the sample autocorrelation function plot (ACF) the sample inverse autocorrelation function plot (IACF) the sample partial autocorrelation function plot (PACF) This correlation analysis panel is shown in Figure 7.3.
198 F Chapter 7: The ARIMA Procedure
Figure 7.3 Correlation Analysis of SALES
These autocorrelation function plots show the degree of correlation with past values of the series as a function of the number of periods in the past (that is, the lag) at which the correlation is computed. The NLAG= option controls the number of lags for which the autocorrelations are shown. By default, the autocorrelation functions are plotted to lag 24. Most books on time series analysis explain how to interpret the autocorrelation and the partial autocorrelation plots. See the section “The Inverse Autocorrelation Function” on page 243 for a discussion of the inverse autocorrelation plots. By examining these plots, you can judge whether the series is stationary or nonstationary. In this case, a visual inspection of the autocorrelation function plot indicates that the SALES series is nonstationary, since the ACF decays very slowly. For more formal stationarity tests, use the STATIONARITY= option. (See the section “Stationarity” on page 213.)
White Noise Test
The last part of the default IDENTIFY statement output is the check for white noise. This is an approximate statistical test of the hypothesis that none of the autocorrelations of the series up to a
Identification Stage F 199
given lag are significantly different from 0. If this is true for all lags, then there is no information in the series to model, and no ARIMA model is needed for the series. The autocorrelations are checked in groups of six, and the number of lags checked depends on the NLAG= option. The check for white noise output is shown in Figure 7.4. Figure 7.4 IDENTIFY Statement Check for White Noise Autocorrelation Check for White Noise To Lag
ChiSquare
DF
Pr > ChiSq
6 12 18 24
426.44 547.82 554.70 585.73
6 12 18 24
) >
controls the plots produced through ODS Graphics. When you specify only one plot request, you can omit the parentheses around the plot request. Here are some examples: plots=none plots=all plots(unpack)=series(corr crosscorr) plots(only)=(series(corr crosscorr) residual(normal smooth))
You must enable ODS Graphics before requesting plots as shown in the following statements. For general information about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide). If you have enabled ODS Graphics but do not specify any specific plot request, then the default plots associated with each of the PROC ARIMA statements used in the program are produced. The old line printer plots are suppressed when ODS Graphics is enabled. ods graphics on; proc arima; identify var=y(1 12); estimate q=(1)(12) noint; run;
Since no specific plot is requested in this program, the default plots associated with the identification and estimation stages are produced. Global Plot Options: The global-plot-options apply to all relevant plots generated by the ARIMA procedure. The following global-plot-options are supported: ONLY
suppresses the default plots. Only the plots specifically requested are produced. UNPACK
breaks a graphic that is otherwise paneled into individual component plots. Specific Plot Options: The following list describes the specific plots and their options. ALL
produces all plots appropriate for the particular analysis.
PROC ARIMA Statement F 229
NONE
suppresses all plots. SERIES(< series-plot-options > )
produces plots associated with the identification stage of the modeling. The panel plots corresponding to the CORR and CROSSCORR options are produced by default. The following series-plot-options are available: ACF
produces the plot of autocorrelations. ALL
produces all the plots associated with the identification stage. CORR
produces a panel of plots that are useful in the trend and correlation analysis of the series. The panel consists of the following: the time series plot the series-autocorrelation plot the series-partial-autocorrelation plot the series-inverse-autocorrelation plot CROSSCORR
produces panels of cross-correlation plots. IACF
produces the plot of inverse-autocorrelations. PACF
produces the plot of partial-autocorrelations. RESIDUAL(< residual-plot-options > )
produces the residuals plots. The residual correlation and normality diagnostic panels are produced by default. The following residual-plot-options are available: ACF
produces the plot of residual autocorrelations. ALL
produces all the residual diagnostics plots appropriate for the particular analysis. CORR
produces a summary panel of the residual correlation diagnostics that consists of the following: the residual-autocorrelation plot
230 F Chapter 7: The ARIMA Procedure
the residual-partial-autocorrelation plot
the residual-inverse-autocorrelation plot
a plot of Ljung-Box white-noise test p-values at different lags
HIST
produces the histogram of the residuals. IACF
produces the plot of residual inverse-autocorrelations. NORMAL
produces a summary panel of the residual normality diagnostics that consists of the following:
histogram of the residuals
normal quantile plot of the residuals
PACF
produces the plot of residual partial-autocorrelations. QQ
produces the normal quantile plot of the residuals. SMOOTH
produces a scatter plot of the residuals against time, which has an overlaid smooth fit. WN
produces the plot of Ljung-Box white-noise test p-values at different lags. FORECAST(< forecast-plot-options > )
produces the forecast plots in the forecasting stage. The forecast-only plot that shows the multistep forecasts in the forecast region is produced by default. The following forecast-plot-options are available: ALL
produces the forecast-only plot as well as the forecast plot. FORECAST
produces a plot that shows the one-step-ahead forecasts as well as the multistepahead forecasts. FORECASTONLY
produces a plot that shows only the multistep-ahead forecasts in the forecast region. OUT=SAS-data-set
specifies a SAS data set to which the forecasts are output. If different OUT= specifications appear in the PROC ARIMA and FORECAST statements, the one in the FORECAST statement is used.
BY Statement F 231
BY Statement BY variables ;
A BY statement can be used in the ARIMA procedure to process a data set in groups of observations defined by the BY variables. Note that all IDENTIFY, ESTIMATE, and FORECAST statements specified are applied to all BY groups. Because of the need to make data-based model selections, BY-group processing is not usually done with PROC ARIMA. You usually want to use different models for the different series contained in different BY groups, and the PROC ARIMA BY statement does not let you do this. Using a BY statement imposes certain restrictions. The BY statement must appear before the first RUN statement. If a BY statement is used, the input data must come from the data set specified in the PROC statement; that is, no input data sets can be specified in IDENTIFY statements. When a BY statement is used with PROC ARIMA, interactive processing applies only to the first BY group. Once the end of the PROC ARIMA step is reached, all ARIMA statements specified are executed again for each of the remaining BY groups in the input data set.
IDENTIFY Statement IDENTIFY VAR=variable options ;
The IDENTIFY statement specifies the time series to be modeled, differences the series if desired, and computes statistics to help identify models to fit. Use an IDENTIFY statement for each time series that you want to model. If other time series are to be used as inputs in a subsequent ESTIMATE statement, they must be listed in a CROSSCORR= list in the IDENTIFY statement. The following options are used in the IDENTIFY statement. The VAR= option is required. ALPHA=significance-level
The ALPHA= option specifies the significance level for tests in the IDENTIFY statement. The default is 0.05. CENTER
centers each time series by subtracting its sample mean. The analysis is done on the centered data. Later, when forecasts are generated, the mean is added back. Note that centering is done after differencing. The CENTER option is normally used in conjunction with the NOCONSTANT option of the ESTIMATE statement. CLEAR
deletes all old models. This option is useful when you want to delete old models so that the input variables are not prewhitened. (See the section “Prewhitening” on page 250 for more information.)
232 F Chapter 7: The ARIMA Procedure
CROSSCORR=variable (d11, d12, . . . , d1k ) CROSSCORR= (variable (d11, d12, . . . , d1k )... variable (d21, d22, . . . , d2k ))
names the variables cross-correlated with the response variable given by the VAR= specification. Each variable name can be followed by a list of differencing lags in parentheses, the same as for the VAR= specification. If differencing is specified for a variable in the CROSSCORR= list, the differenced series is cross-correlated with the VAR= option series, and the differenced series is used when the ESTIMATE statement INPUT= option refers to the variable. DATA=SAS-data-set
specifies the input SAS data set that contains the time series. If the DATA= option is omitted, the DATA= data set specified in the PROC ARIMA statement is used; if the DATA= option is omitted from the PROC ARIMA statement as well, the most recently created data set is used. ESACF
computes the extended sample autocorrelation function and uses these estimates to tentatively identify the autoregressive and moving-average orders of mixed models. The ESACF option generates two tables. The first table displays extended sample autocorrelation estimates, and the second table displays probability values that can be used to test the significance of these estimates. The P=.pmi n W pmax / and Q=.qmi n W qmax / options determine the size of the table. The autoregressive and moving-average orders are tentatively identified by finding a triangular pattern in which all values are insignificant. The ARIMA procedure finds these patterns based on the IDENTIFY statement ALPHA= option and displays possible recommendations for the orders. The following code generates an ESACF table with dimensions of p=(0:7) and q=(0:8). proc arima data=test; identify var=x esacf p=(0:7) q=(0:8); run;
See the section “The ESACF Method” on page 245 for more information. MINIC
uses information criteria or penalty functions to provide tentative ARMA order identification. The MINIC option generates a table that contains the computed information criterion associated with various ARMA model orders. The PERROR=.p;mi n W p;max / option determines the range of the autoregressive model orders used to estimate the error series. The P=.pmi n W pmax / and Q=.qmi n W qmax / options determine the size of the table. The ARMA orders are tentatively identified by those orders that minimize the information criterion. The following statements generate a MINIC table with default dimensions of p=(0:5) and q=(0:5) and with the error series estimated by an autoregressive model with an order, p , that minimizes the AIC in the range from 8 to 11.
IDENTIFY Statement F 233
proc arima data=test; identify var=x minic perror=(8:11); run;
See the section “The MINIC Method” on page 246 for more information. NLAG=number
indicates the number of lags to consider in computing the autocorrelations and crosscorrelations. To obtain preliminary estimates of an ARIMA(p, d, q ) model, the NLAG= value must be at least p +q +d. The number of observations must be greater than or equal to the NLAG= value. The default value for NLAG= is 24 or one-fourth the number of observations, whichever is less. Even though the NLAG= value is specified, the NLAG= value can be changed according to the data set. NOMISS
uses only the first continuous sequence of data with no missing values. By default, all observations are used. NOPRINT
suppresses the normal printout (including the correlation plots) generated by the IDENTIFY statement. OUTCOV=SAS-data-set
writes the autocovariances, autocorrelations, inverse autocorrelations, partial autocorrelations, and cross covariances to an output SAS data set. If the OUTCOV= option is not specified, no covariance output data set is created. See the section “OUTCOV= Data Set” on page 267 for more information. P=(pmi n W pmax )
see the ESACF, MINIC, and SCAN options for details. PERROR=(p;mi n W p;max )
determines the range of the autoregressive model orders used to estimate the error series in MINIC, a tentative ARMA order identification method. See the section “The MINIC Method” on page 246 for more information. By default p;mi n is set to pmax and p;max is set to pmax C qmax , where pmax and qmax are the maximum settings of the P= and Q= options on the IDENTIFY statement. Q=(qmi n W qmax )
see the ESACF, MINIC, and SCAN options for details. SCAN
computes estimates of the squared canonical correlations and uses these estimates to tentatively identify the autoregressive and moving-average orders of mixed models. The SCAN option generates two tables. The first table displays squared canonical correlation estimates, and the second table displays probability values that can be used to test the significance of these estimates. The P=.pmi n W pmax / and Q=.qmi n W qmax / options determine the size of each table.
234 F Chapter 7: The ARIMA Procedure
The autoregressive and moving-average orders are tentatively identified by finding a rectangular pattern in which all values are insignificant. The ARIMA procedure finds these patterns based on the IDENTIFY statement ALPHA= option and displays possible recommendations for the orders. The following code generates a SCAN table with default dimensions of p=(0:5) and q=(0:5). The recommended orders are based on a significance level of 0.1. proc arima data=test; identify var=x scan alpha=0.1; run;
See the section “The SCAN Method” on page 248 for more information. STATIONARITY=
performs stationarity tests. Stationarity tests can be used to determine whether differencing terms should be included in the model specification. In each stationarity test, the autoregressive orders can be specified by a range, test= armax , or as a list of values, test= .ar1 ; ::; arn /, where test is ADF, PP, or RW. The default is (0,1,2). See the section “Stationarity Tests” on page 250 for more information. STATIONARITY=(ADF= AR orders DLAG= s ) STATIONARITY=(DICKEY= AR orders DLAG= s )
performs augmented Dickey-Fuller tests. If the DLAG=s option is specified with s is greater than one, seasonal Dickey-Fuller tests are performed. The maximum allowable value of s is 12. The default value of s is 1. The following code performs augmented Dickey-Fuller tests with autoregressive orders 2 and 5. proc arima data=test; identify var=x stationarity=(adf=(2,5)); run;
STATIONARITY=(PP= AR orders ) STATIONARITY=(PHILLIPS= AR orders )
performs Phillips-Perron tests. The following statements perform augmented Phillips-Perron tests with autoregressive orders ranging from 0 to 6. proc arima data=test; identify var=x stationarity=(pp=6); run;
STATIONARITY=(RW=AR orders ) STATIONARITY=(RANDOMWALK=AR orders )
performs random-walk-with-drift tests. The following statements perform random-walk-withdrift tests with autoregressive orders ranging from 0 to 2.
ESTIMATE Statement F 235
proc arima data=test; identify var=x stationarity=(rw); run;
VAR=variable VAR= variable ( d1, d2, . . . , dk )
names the variable that contains the time series to analyze. The VAR= option is required. A list of differencing lags can be placed in parentheses after the variable name to request that the series be differenced at these lags. For example, VAR=X(1) takes the first differences of X. VAR=X(1,1) requests that X be differenced twice, both times with lag 1, producing a second difference series, which is .Xt Xt 1 / .Xt 1 Xt 2 / D Xt 2Xt 1 C Xt 2 . VAR=X(2) differences X once at lag two .Xt
Xt
2 /.
If differencing is specified, it is the differenced series that is processed by any subsequent ESTIMATE statement. WHITENOISE=ST | IGNOREMISS
specifies the type of test statistic that is used in the white noise test of the series when the series contains missing values. If WHITENOISE=IGNOREMISS, the standard Ljung-Box test statistic is used. If WHITENOISE=ST, a modification of this statistic suggested by Stoffer and Toloi (1992) is used. The default is WHITENOISE=ST.
ESTIMATE Statement < label: >ESTIMATE options ;
The ESTIMATE statement specifies an ARMA model or transfer function model for the response variable specified in the previous IDENTIFY statement, and produces estimates of its parameters. The ESTIMATE statement also prints diagnostic information by which to check the model. The label in the ESTIMATE statement is optional. Include an ESTIMATE statement for each model that you want to estimate. Options used in the ESTIMATE statement are described in the following sections.
Options for Defining the Model and Controlling Diagnostic Statistics The following options are used to define the model to be estimated and to control the output that is printed. ALTPARM
specifies the alternative parameterization of the overall scale of transfer functions in the model. See the section “Alternative Model Parameterization” on page 257 for details.
236 F Chapter 7: The ARIMA Procedure
INPUT=variable INPUT=( transfer-function variable . . . )
specifies input variables and their transfer functions. The variables used on the INPUT= option must be included in the CROSSCORR= list in the previous IDENTIFY statement. If any differencing is specified in the CROSSCORR= list, then the differenced series is used as the input to the transfer function. The transfer function specification for an input variable is optional. If no transfer function is specified, the input variable enters the model as a simple regressor. If specified, the transfer function specification has the following syntax: S $.L1;1 ; L1;2 ; : : :/.L2;1 ; : : :/ : : : =.Lj;1 ; : : :/ : : : Here, S is a shift or lag of the input variable, the terms before the slash (/) are numerator factors, and the terms after the slash (/) are denominator factors of the transfer function. All three parts are optional. See the section “Specifying Inputs and Transfer Functions” on page 256 for details. METHOD=value
specifies the estimation method to use. METHOD=ML specifies the maximum likelihood method. METHOD=ULS specifies the unconditional least squares method. METHOD=CLS specifies the conditional least squares method. METHOD=CLS is the default. See the section “Estimation Details” on page 252 for more information. NOCONSTANT NOINT
suppresses the fitting of a constant (or intercept) parameter in the model. (That is, the parameter is omitted.) NODF
estimates the variance by dividing the error sum of squares (SSE) by the number of residuals. The default is to divide the SSE by the number of residuals minus the number of free parameters in the model. NOPRINT
suppresses the normal printout generated by the ESTIMATE statement. If the NOPRINT option is specified for the ESTIMATE statement, then any error and warning messages are printed to the SAS log. P=order P=(lag, . . . , lag ) . . . (lag, . . . , lag )
specifies the autoregressive part of the model. By default, no autoregressive parameters are fit. P=(l 1 , l 2 , . . . , l k ) defines a model with autoregressive parameters at the specified lags. P= order is equivalent to P=(1, 2, . . . , order). A concatenation of parenthesized lists specifies a factored model. For example, P=(1,2,5)(6,12) specifies the autoregressive model .1
1;1 B
1;2 B 2
1;3 B 5 /.1
2;1 B 6
2;2 B 12 /
ESTIMATE Statement F 237
PLOT
plots the residual autocorrelation functions. The sample autocorrelation, the sample inverse autocorrelation, and the sample partial autocorrelation functions of the model residuals are plotted. Q=order Q=(lag, . . . , lag ) . . . (lag, . . . , lag )
specifies the moving-average part of the model. By default, no moving-average part is included in the model. Q=(l 1 , l 2 , . . . , l k ) defines a model with moving-average parameters at the specified lags. Q= order is equivalent to Q=(1, 2, . . . , order). A concatenation of parenthesized lists specifies a factored model. The interpretation of factors and lags is the same as for the P= option. WHITENOISE=ST | IGNOREMISS
specifies the type of test statistic that is used in the white noise test of the series when the series contains missing values. If WHITENOISE=IGNOREMISS, the standard Ljung-Box test statistic is used. If WHITENOISE=ST, a modification of this statistic suggested by Stoffer and Toloi (1992) is used. The default is WHITENOISE=ST.
Options for Output Data Sets The following options are used to store results in SAS data sets: OUTEST=SAS-data-set
writes the parameter estimates to an output data set. If the OUTCORR or OUTCOV option is used, the correlations or covariances of the estimates are also written to the OUTEST= data set. See the section “OUTEST= Data Set” on page 267 for a description of the OUTEST= output data set. OUTCORR
writes the correlations of the parameter estimates to the OUTEST= data set. OUTCOV
writes the covariances of the parameter estimates to the OUTEST= data set. OUTMODEL=SAS-data-set
writes the model and parameter estimates to an output data set. If OUTMODEL= is not specified, no model output data set is created. See the section “OUTMODEL= SAS Data Set” on page 270 for a description of the OUTMODEL= output data set. OUTSTAT=SAS-data-set
writes the model diagnostic statistics to an output data set. If OUTSTAT= is not specified, no statistics output data set is created. See the section “OUTSTAT= Data Set” on page 272 for a description of the OUTSTAT= output data set.
238 F Chapter 7: The ARIMA Procedure
Options to Specify Parameter Values The following options enable you to specify values for the model parameters. These options can provide starting values for the estimation process, or you can specify fixed parameters for use in the FORECAST stage and suppress the estimation process with the NOEST option. By default, the ARIMA procedure finds initial parameter estimates and uses these estimates as starting values in the iterative estimation process. If values for any parameters are specified, values for all parameters should be given. The number of values given must agree with the model specifications. AR=value . . .
lists starting values for the autoregressive parameters. See the section “Initial Values” on page 258 for more information. INITVAL=(initializer-spec variable . . . )
specifies starting values for the parameters in the transfer function parts of the model. See the section “Initial Values” on page 258 for more information. MA=value . . .
lists starting values for the moving-average parameters. See the section “Initial Values” on page 258 for more information. MU=value
specifies the MU parameter. NOEST
uses the values specified with the AR=, MA=, INITVAL=, and MU= options as final parameter values. The estimation process is suppressed except for estimation of the residual variance. The specified parameter values are used directly by the next FORECAST statement. When NOEST is specified, standard errors, t values, and the correlations between estimates are displayed as 0 or missing. (The NOEST option is useful, for example, when you want to generate forecasts that correspond to a published model.)
Options to Control the Iterative Estimation Process The following options can be used to control the iterative process of minimizing the error sum of squares or maximizing the log-likelihood function. These tuning options are not usually needed but can be useful if convergence problems arise. BACKLIM= n
omits the specified number of initial residuals from the sum of squares or likelihood function. Omitting values can be useful for suppressing transients in transfer function models that are sensitive to start-up values. CONVERGE=value
specifies the convergence criterion. Convergence is assumed when the largest change in the estimate for any parameter is less that the CONVERGE= option value. If the absolute value of
ESTIMATE Statement F 239
the parameter estimate is greater than 0.01, the relative change is used; otherwise, the absolute change in the estimate is used. The default is CONVERGE=0.001. DELTA=value
specifies the perturbation value for computing numerical derivatives. DELTA=0.001.
The default is
GRID
prints the error sum of squares (SSE) or concentrated log-likelihood surface in a small grid of the parameter space around the final estimates. For each pair of parameters, the SSE is printed for the nine parameter-value combinations formed by the grid, with a center at the final estimates and with spacing given by the GRIDVAL= specification. The GRID option can help you judge whether the estimates are truly at the optimum, since the estimation process does not always converge. For models with a large number of parameters, the GRID option produces voluminous output. GRIDVAL=number
controls the spacing in the grid printed by the GRID option. The default is GRIDVAL=0.005. MAXITER=n MAXIT=n
specifies the maximum number of iterations allowed. The default is MAXITER=50. NOLS
begins the maximum likelihood or unconditional least squares iterations from the preliminary estimates rather than from the conditional least squares estimates that are produced after four iterations. See the section “Estimation Details” on page 252 for more information. NOSTABLE
specifies that the autoregressive and moving-average parameter estimates for the noise part of the model not be restricted to the stationary and invertible regions, respectively. See the section “Stationarity and Invertibility” on page 259 for more information. PRINTALL
prints preliminary estimation results and the iterations in the final estimation process. NOTFSTABLE
specifies that the parameter estimates for the denominator polynomial of the transfer function part of the model not be restricted to the stability region. See the section “Stationarity and Invertibility” on page 259 for more information. SINGULAR=value
specifies the criterion for checking singularity. If a pivot of a sweep operation is less than the SINGULAR= value, the matrix is deemed singular. Sweep operations are performed on the Jacobian matrix during final estimation and on the covariance matrix when preliminary estimates are obtained. The default is SINGULAR=1E–7.
240 F Chapter 7: The ARIMA Procedure
OUTLIER Statement OUTLIER options ;
The OUTLIER statement can be used to detect shifts in the level of the response series that are not accounted for by the previously estimated model. An ESTIMATE statement must precede the OUTLIER statement. The following options are used in the OUTLIER statement: TYPE=ADDITIVE TYPE=SHIFT TYPE=TEMP ( d1 ; : : : ; dk ) TYPE=(< ADDITIVE >< SHIFT > < TEMP ( d1 ; : : : ; dk ) ) >
specifies the types of level shifts to search for. The default is TYPE=(ADDITIVE SHIFT), which requests searching for additive outliers and permanent level shifts. The option TEMP( d1 ; : : : ; dk ) requests searching for temporary changes in the level of durations d1 ; : : : ; dk . These options can also be abbreviated as AO, LS, and TC. ALPHA=significance-level
specifies the significance level for tests in the OUTLIER statement. The default is 0.05. SIGMA=ROBUST | MSE
specifies the type of error variance estimate to use in the statistical tests performed during the outlier detection. SIGMA=MSE corresponds to the usual mean squared error (MSE) estimate, and SIGMA=ROBUST corresponds to a robust estimate of the error variance. The default is SIGMA=ROBUST. MAXNUM=number
limits the number of outliers to search. The default is MAXNUM=5. MAXPCT=number
limits the number of outliers to search for according to a percentage of the series length. The default is MAXPCT=2. When both the MAXNUM= and MAXPCT= options are specified, the minimum of the two search numbers is used. ID=Date-Time ID variable
specifies a SAS date, time, or datetime identification variable to label the detected outliers. This variable must be present in the input data set. The following examples illustrate a few possibilities for the OUTLIER statement. The most basic usage, shown as follows, sets all the options to their default values. outlier;
That is, it is equivalent to outlier type=(ao ls) alpha=0.05 sigma=robust maxnum=5 maxpct=2;
FORECAST Statement F 241
The following statement requests a search for permanent level shifts and for temporary level changes of durations 6 and 12. The search is limited to at most three changes and the significance level of the underlying tests is 0.001. MSE is used as the estimate of error variance. It also requests labeling of the detected shifts using an ID variable date. outlier type=(ls tc(6 12)) alpha=0.001 sigma=mse maxnum=3 ID=date;
FORECAST Statement FORECAST options ;
The FORECAST statement generates forecast values for a time series by using the parameter estimates produced by the previous ESTIMATE statement. See the section “Forecasting Details” on page 260 for more information about calculating forecasts. The following options can be used in the FORECAST statement: ALIGN=option
controls the alignment of SAS dates used to identify output observations. The ALIGN= option allows the following values: BEGINNING|BEG|B, MIDDLE|MID|M, and ENDING|END|E. BEGINNING is the default. ALPHA=n
sets the size of the forecast confidence limits. The ALPHA= value must be between 0 and 1. When you specify ALPHA=˛, the upper and lower confidence limits have a 1 ˛ confidence level. The default is ALPHA=0.05, which produces 95% confidence intervals. ALPHA values are rounded to the nearest hundredth. BACK=n
specifies the number of observations before the end of the data where the multistep forecasts are to begin. The BACK= option value must be less than or equal to the number of observations minus the number of parameters. The default is BACK=0, which means that the forecast starts at the end of the available data. The end of the data is the last observation for which a noise value can be calculated. If there are no input series, the end of the data is the last nonmissing value of the response time series. If there are input series, this observation can precede the last nonmissing value of the response variable, since there may be missing values for some of the input series. ID=variable
names a variable in the input data set that identifies the time periods associated with the observations. The ID= variable is used in conjunction with the INTERVAL= option to extrapolate ID values from the end of the input data to identify forecast periods in the OUT= data set. If the INTERVAL= option specifies an interval type, the ID variable must be a SAS date or datetime variable with the spacing between observations indicated by the INTERVAL= value.
242 F Chapter 7: The ARIMA Procedure
If the INTERVAL= option is not used, the last input value of the ID= variable is incremented by one for each forecast period to extrapolate the ID values for forecast observations. INTERVAL=interval INTERVAL=n
specifies the time interval between observations. See Chapter 4, “Date Intervals, Formats, and Functions,” for information about valid INTERVAL= values. The value of the INTERVAL= option is used by PROC ARIMA to extrapolate the ID values for forecast observations and to check that the input data are in order with no missing periods. See the section “Specifying Series Periodicity” on page 263 for more details. LEAD=n
specifies the number of multistep forecast values to compute. For example, if LEAD=10, PROC ARIMA forecasts for ten periods beginning with the end of the input series (or earlier if BACK= is specified). It is possible to obtain fewer than the requested number of forecasts if a transfer function model is specified and insufficient data are available to compute the forecast. The default is LEAD=24. NOOUTALL
includes only the final forecast observations in the OUT= output data set, not the one-step forecasts for the data before the forecast period. NOPRINT
suppresses the normal printout of the forecast and associated values. OUT=SAS-data-set
writes the forecast (and other values) to an output data set. If OUT= is not specified, the OUT= data set specified in the PROC ARIMA statement is used. If OUT= is also not specified in the PROC ARIMA statement, no output data set is created. See the section “OUT= Data Set” on page 265 for more information. PRINTALL
prints the FORECAST computation throughout the whole data set. The forecast values for the data before the forecast period (specified by the BACK= option) are one-step forecasts. SIGSQ=value
specifies the variance term used in the formula for computing forecast standard errors and confidence limits. The default value is the variance estimate computed by the preceding ESTIMATE statement. This option is useful when you wish to generate forecast standard errors and confidence limits based on a published model. It would often be used in conjunction with the NOEST option in the preceding ESTIMATE statement.
Details: ARIMA Procedure F 243
Details: ARIMA Procedure
The Inverse Autocorrelation Function The sample inverse autocorrelation function (SIACF) plays much the same role in ARIMA modeling as the sample partial autocorrelation function (SPACF), but it generally indicates subset and seasonal autoregressive models better than the SPACF. Additionally, the SIACF can be useful for detecting over-differencing. If the data come from a nonstationary or nearly nonstationary model, the SIACF has the characteristics of a noninvertible moving-average. Likewise, if the data come from a model with a noninvertible moving average, then the SIACF has nonstationary characteristics and therefore decays slowly. In particular, if the data have been over-differenced, the SIACF looks like a SACF from a nonstationary process. The inverse autocorrelation function is not often discussed in textbooks, so a brief description is given here. More complete discussions can be found in Cleveland (1972), Chatfield (1980), and Priestly (1981). Let Wt be generated by the ARMA(p, q ) process .B/Wt D .B/at where at is a white noise sequence. If (B) is invertible (that is, if considered as a polynomial in B has no roots less than or equal to 1 in magnitude), then the model .B/Zt D .B/at is also a valid ARMA(q,p ) model. This model is sometimes referred to as the dual model. The autocorrelation function (ACF) of this dual model is called the inverse autocorrelation function (IACF) of the original model. Notice that if the original model is a pure autoregressive model, then the IACF is an ACF that corresponds to a pure moving-average model. Thus, it cuts off sharply when the lag is greater than p; this behavior is similar to the behavior of the partial autocorrelation function (PACF). The sample inverse autocorrelation function (SIACF) is estimated in the ARIMA procedure by the following steps. A high-order autoregressive model is fit to the data by means of the Yule-Walker equations. The order of the autoregressive model used to calculate the SIACF is the minimum of the NLAG= value and one-half the number of observations after differencing. The SIACF is then calculated as the autocorrelation function that corresponds to this autoregressive operator when treated as a moving-average operator. That is, the autoregressive coefficients are convolved with themselves and treated as autocovariances. Under certain conditions, the sampling distribution of the SIACF can be approximated by the sampling distribution of the SACF of the dual model (Bhansali 1980). In the plots generated by p ARIMA, the confidence limit marks (.) are located at ˙2= n. These limits bound an approximate 95% confidence interval for the hypothesis that the data are from a white noise process.
244 F Chapter 7: The ARIMA Procedure
The Partial Autocorrelation Function The approximation for a standard error for the estimated partial autocorrelation function at lag k is based on a null hypothesis that a pure autoregressive Gaussian process of order k–1 generated the p time series. This standard error is 1= n and is used to produce the approximate 95% confidence intervals depicted by the dots in the plot.
The Cross-Correlation Function The autocorrelation and partial and inverse autocorrelation functions described in the preceding sections help when you want to model a series as a function of its past values and past random errors. When you want to include the effects of past and current values of other series in the model, the correlations of the response series and the other series must be considered. The CROSSCORR= option in the IDENTIFY statement computes cross-correlations of the VAR= series with other series and makes these series available for use as inputs in models specified by later ESTIMATE statements. When the CROSSCORR= option is used, PROC ARIMA prints a plot of the cross-correlation function for each variable in the CROSSCORR= list. This plot is similar in format to the other correlation plots, but it shows the correlation between the two series at both lags and leads. For example, identify var=y crosscorr=x ...;
plots the cross-correlation function of Y and X, Cor.yt ; xt s /, for s D L to L, where L is the value of the NLAG= option. Study of the cross-correlation functions can indicate the transfer functions through which the input series should enter the model for the response series. The cross-correlation function is computed after any specified differencing has been done. If differencing is specified for the VAR= variable or for a variable in the CROSSCORR= list, it is the differenced series that is cross-correlated (and the differenced series is processed by any following ESTIMATE statement). For example, identify var=y(1) crosscorr=x(1);
computes the cross-correlations of the changes in Y with the changes in X. When differencing is specified, the subsequent ESTIMATE statement models changes in the variables rather than the variables themselves.
The ESACF Method F 245
The ESACF Method The extended sample autocorrelation function (ESACF) method can tentatively identify the orders of a stationary or nonstationary ARMA process based on iterated least squares estimates of the autoregressive parameters. Tsay and Tiao (1984) proposed the technique, and Choi (1992) provides useful descriptions of the algorithm. Given a stationary or nonstationary time series fzt W 1 t ng with mean corrected form zQ t D zt z with a true autoregressive order of p C d and with a true moving-average order of q, you can use the ESACF method to estimate the unknown orders p C d and q by analyzing the autocorrelation functions associated with filtered series of the form .m;j /
wt
O .m;j / .B/zQ t D zQt Dˆ
m X
.m;j / O i zQ t
i
i D1
where B represents the backshift operator, where m D pmi n ; : : :; pmax are the autoregressive test .m;j / orders, where j D qmi n C 1; : : :; qmax C 1 are the moving-average test orders, and where O i are the autoregressive parameter estimates under the assumption that the series is an ARMA(m; j ) process. For purely autoregressive models (j D 0), ordinary least squares (OLS) is used to consistently .m;0/ estimate O i . For ARMA models, consistent estimates are obtained by the iterated least squares recursion formula, which is initiated by the pure autoregressive estimates: .m;j / .mC1;j O i D O i
1/
.m;j O i 1
O .mC1;j 1/ 1/ mC1 .m;j 1/ O m .m;j /
The j th lag of the sample autocorrelation function of the filtered series wt autocorrelation function, and it is denoted as rj.m/ D rj .w .m;j / /.
is the extended sample
The standard errors of rj.m/ are computed in the usual way by using Bartlett’s approximation of the Pj 1 variance of the sample autocorrelation function, var.rj.m/ / .1 C tD1 rj2 .w .m;j / //. .m;j /
If the true model is an ARMA (p C d; q) process, the filtered series wt for j q so that rj.pCd / 0
j >q
rj.pCd / ¤ 0
j Dq
follows an MA(q) model
Additionally, Tsay and Tiao (1984) show that the extended sample autocorrelation satisfies rj.m/ 0
j
rj.m/ ¤ c.m where c.m and 1.
p
q>m p
d; j
d; j
p q/
d 0 0j
qm
p
d
q/ is a nonzero constant or a continuous random variable bounded by –1
246 F Chapter 7: The ARIMA Procedure
An ESACF table is then constructed by using the rj.m/ for m D pmi n; : : :; pmax and j D qmi n C 1; : : :; qmax C 1 to identify the ARMA orders (see Table 7.4). The orders are tentatively identified by finding a right (maximal) triangular pattern with vertices located at .p C d; q/ and .p C d; qmax / and in which all elements are insignificant (based on asymptotic normality of the autocorrelation function). The vertex .p C d; q/ identifies the order. Table 7.5 depicts the theoretical pattern associated with an ARMA(1,2) series. Table 7.4
ESACF Table
AR 0 1 2 3
Table 7.5
0 r1.0/ r1.1/ r1.2/ r1.3/
1 r2.0/ r2.1/ r2.2/ r2.3/
MA 2 r3.0/ r3.1/ r3.2/ r3.3/
3 r4.0/ r4.1/ r4.2/ r4.3/
Theoretical ESACF Table for an ARMA(1,2) Series
AR 0 1 2 3 4
MA 0 1 2 3 4 5 * X X X X X * X 0 0 0 0 * X X 0 0 0 * X X X 0 0 * X X X X 0 X = significant terms 0 = insignificant terms * = no pattern
6 X 0 0 0 0
7 X 0 0 0 0
The MINIC Method The minimum information criterion (MINIC) method can tentatively identify the order of a stationary and invertible ARMA process. Note that Hannan and Rissannen (1982) proposed this method, and Box, Jenkins, and Reinsel (1994) and Choi (1992) provide useful descriptions of the algorithm. Given a stationary and invertible time series fzt W 1 t ng with mean corrected form zQ t D zt z with a true autoregressive order of p and with a true moving-average order of q, you can use the MINIC method to compute information criteria (or penalty functions) for various autoregressive and moving average orders. The following paragraphs provide a brief description of the algorithm.
The MINIC Method F 247
If the series is a stationary and invertible ARMA(p, q ) process of the form ˆ.p;q/ .B/zQ t D ‚.p;q/ .B/t the error series can be approximated by a high-order AR process O .p ;q/ .B/zQ t t Ot D ˆ O .p ;q/ are obtained from the Yule-Walker estimates. The choice where the parameter estimates ˆ of the autoregressive order p is determined by the order that minimizes the Akaike information criterion (AIC) in the range p;mi n p p;max 2 AIC.p ; 0/ D ln.Q .p / C 2.p C 0/=n ;0/
where 2 Q .p ;0/
1 D n
n X
Ot2
t Dp C1
Note that Hannan and Rissannen (1982) use the Bayesian information criterion (BIC) to determine the autoregressive order used to estimate the error series. Box, Jenkins, and Reinsel (1994) and Choi (1992) recommend the AIC. Once the error series has been estimated for autoregressive test order m D pmi n ; : : :; pmax and for O .m;j / and ‚ O .m;j / are computed moving-average test order j D qmi n ; : : :; qmax , the OLS estimates ˆ from the regression model zQ t D
m X
.m;j / i zQ t i
C
i D1
j X
.m;j /
k
Ot
k
C error
kD1
From the preceding parameter estimates, the BIC is then computed 2 BIC.m; j / D ln.Q .m;j / / C 2.m C j /ln.n/=n
where
2 Q .m;j /
0 n X 1 @zQ t D n t Dt 0
m X i D1
.m;j /
i
zQ t
i
C
j X
1 .m;j /
k
Ot
k
A
kD1
where t0 D p C max.m; j /. A MINIC table is then constructed using BIC.m; j /; see Table 7.6. If pmax > p;mi n , the preceding regression might fail due to linear dependence on the estimated error series and the mean-corrected series. Values of BIC.m; j / that cannot be computed are set to missing. For large autoregressive and moving-average test orders with relatively few observations, a nearly perfect fit can result. This condition can be identified by a large negative BIC.m; j / value.
248 F Chapter 7: The ARIMA Procedure
Table 7.6
MINIC Table
MA AR 0 1 2 3
0 BIC.0; 0/ BIC.1; 0/ BIC.2; 0/ BIC.3; 0/
1 BIC.0; 1/ BIC.1; 1/ BIC.2; 1/ BIC.3; 1/
2 BIC.0; 2/ BIC.1; 2/ BIC.2; 2/ BIC.3; 2/
3 BIC.0; 3/ BIC.1; 3/ BIC.2; 3/ BIC.3; 3/
The SCAN Method The smallest canonical (SCAN) correlation method can tentatively identify the orders of a stationary or nonstationary ARMA process. Tsay and Tiao (1985) proposed the technique, and Box, Jenkins, and Reinsel (1994) and Choi (1992) provide useful descriptions of the algorithm. Given a stationary or nonstationary time series fzt W 1 t ng with mean corrected form zQ t D zt z with a true autoregressive order of p C d and with a true moving-average order of q, you can use the SCAN method to analyze eigenvalues of the correlation matrix of the ARMA process. The following paragraphs provide a brief description of the algorithm. For autoregressive test order m D pmi n ; : : :; pmax and for moving-average test order j D qmi n ; : : :; qmax , perform the following steps. 1. Let Ym;t D .zQ t ; zQ t
0 1 ; : : :; zQ t m / .
Compute the following .m C 1/ .m C 1/ matrix !
O ˇ.m; j C 1/ D
X t
ˇO .m; j C 1/ D
! 0 Ym;t j 1 Ym;t
t
! X
1
X
0 Ym;t j 1 Ym;t j 1 1
! X
0 Ym;t Ym;t
t
0 Ym;t Ym;t
j 1
t
O AO .m; j / D ˇO .m; j C 1/ˇ.m; j C 1/ where t ranges from j C m C 2 to n. 2. Find the smallest eigenvalue, O .m; j /, of AO .m; j / and its corresponding normalized eigen.m;j / .m;j / .m;j / vector, ˆm;j D .1; 1 ; 2 ; : : : ; m /. The squared canonical correlation estimate is O .m; j /. 3. Using the ˆm;j as AR(m) coefficients, obtain the residuals for t D j C m C 1 to n, by .m;j / .m;j / .m;j / .m;j / following the formula: wt D zQt 1 zQ t 1 2 zQ t 2 : : : m zQ t m . 4. From the sample autocorrelations of the residuals, rk .w/, approximate the standard error of the squared canonical correlation estimate by var.O .m; j /1=2 / d.m; j /=.n
m
j/
The SCAN Method F 249
where d.m; j / D .1 C 2
Pj
1 .m;j / //. i D1 rk .w
The test statistic to be used as an identification criterion is c.m; j / D
.n
m
j /ln.1
O .m; j /=d.m; j //
which is asymptotically 21 if m D p C d and j q or if m p C d and j D q. For m > p and j < q, there is more than one theoretical zero canonical correlation between Ym;t and Ym;t j 1 . Since the O .m; j / are the smallest canonical correlations for each .m; j /, the percentiles of c.m; j / are less than those of a 21 ; therefore, Tsay and Tiao (1985) state that it is safe to assume a 21 . For m < p and j < q, no conclusions about the distribution of c.m; j / are made. A SCAN table is then constructed using c.m; j / to determine which of the O .m; j / are significantly different from zero (see Table 7.7). The ARMA orders are tentatively identified by finding a (maximal) rectangular pattern in which the O .m; j / are insignificant for all test orders m p C d and j q. There may be more than one pair of values (p C d; q) that permit such a rectangular pattern. In this case, parsimony and the number of insignificant items in the rectangular pattern should help determine the model order. Table 7.8 depicts the theoretical pattern associated with an ARMA(2,2) series. Table 7.7
SCAN Table
AR 0 1 2 3
Table 7.8
0 c.0; 0/ c.1; 0/ c.2; 0/ c.3; 0/
1 c.0; 1/ c.1; 1/ c.2; 1/ c.3; 1/
MA 2 c.0; 2/ c.1; 2/ c.2; 2/ c.3; 2/
3 c.0; 3/ c.1; 3/ c.2; 3/ c.3; 3/
Theoretical SCAN Table for an ARMA(2,2) Series
AR 0 1 2 3 4
MA 0 1 2 3 4 5 * X X X X X * X X X X X * X 0 0 0 0 * X 0 0 0 0 * X 0 0 0 0 X = significant terms 0 = insignificant terms * = no pattern
6 X X 0 0 0
7 X X 0 0 0
250 F Chapter 7: The ARIMA Procedure
Stationarity Tests When a time series has a unit root, the series is nonstationary and the ordinary least squares (OLS) estimator is not normally distributed. Dickey (1976) and Dickey and Fuller (1979) studied the limiting distribution of the OLS estimator of autoregressive models for time series with a simple unit root. Dickey, Hasza, and Fuller (1984) obtained the limiting distribution for time series with seasonal unit roots. Hamilton (1994) discusses the various types of unit root testing. For a description of Dickey-Fuller tests, see the section “PROBDF Function for Dickey-Fuller Tests” on page 162 in Chapter 5. See Chapter 8, “The AUTOREG Procedure,” for a description of Phillips-Perron tests. The random-walk-with-drift test recommends whether or not an integrated times series has a drift term. Hamilton (1994) discusses this test.
Prewhitening If, as is usually the case, an input series is autocorrelated, the direct cross-correlation function between the input and response series gives a misleading indication of the relation between the input and response series. One solution to this problem is called prewhitening. You first fit an ARIMA model for the input series sufficient to reduce the residuals to white noise; then, filter the input series with this model to get the white noise residual series. You then filter the response series with the same model and cross-correlate the filtered response with the filtered input series. The ARIMA procedure performs this prewhitening process automatically when you precede the IDENTIFY statement for the response series with IDENTIFY and ESTIMATE statements to fit a model for the input series. If a model with no inputs was previously fit to a variable specified by the CROSSCORR= option, then that model is used to prewhiten both the input series and the response series before the cross-correlations are computed for the input series. For example, proc arima data=in; identify var=x; estimate p=1 q=1; identify var=y crosscorr=x; run;
Both X and Y are filtered by the ARMA(1,1) model fit to X before the cross-correlations are computed. Note that prewhitening is done to estimate the cross-correlation function; the unfiltered series are used in any subsequent ESTIMATE or FORECAST statements, and the correlation functions of Y with its own lags are computed from the unfiltered Y series. But initial values in the ESTIMATE
Identifying Transfer Function Models F 251
statement are obtained with prewhitened data; therefore, the result with prewhitening can be different from the result without prewhitening. To suppress prewhitening for all input variables, use the CLEAR option in the IDENTIFY statement to make PROC ARIMA disregard all previous models.
Prewhitening and Differencing If the VAR= and CROSSCORR= options specify differencing, the series are differenced before the prewhitening filter is applied. When the differencing lists specified in the VAR= option for an input and in the CROSSCORR= option for that input are not the same, PROC ARIMA combines the two lists so that the differencing operators used for prewhitening include all differences in either list (in the least common multiple sense).
Identifying Transfer Function Models When identifying a transfer function model with multiple input variables, the cross-correlation functions can be misleading if the input series are correlated with each other. Any dependencies among two or more input series will confound their cross-correlations with the response series. The prewhitening technique assumes that the input variables do not depend on past values of the response variable. If there is feedback from the response variable to an input variable, as evidenced by significant cross-correlation at negative lags, both the input and the response variables need to be prewhitened before meaningful cross-correlations can be computed. PROC ARIMA cannot handle feedback models. The STATESPACE and VARMAX procedures are more appropriate for models with feedback.
Missing Values and Autocorrelations To compute the sample autocorrelation function when missing values are present, PROC ARIMA uses only crossproducts that do not involve missing values and employs divisors that reflect the number of crossproducts used rather than the total length of the series. Sample partial autocorrelations and inverse autocorrelations are then computed by using the sample autocorrelation function. If necessary, a taper is employed to transform the sample autocorrelations into a positive definite sequence before calculating the partial autocorrelation and inverse correlation functions. The confidence intervals produced for these functions might not be valid when there are missing values. The distributional properties for sample correlation functions are not clear for finite samples. See Dunsmuir (1984) for some asymptotic properties of the sample correlation functions.
252 F Chapter 7: The ARIMA Procedure
Estimation Details The ARIMA procedure primarily uses the computational methods outlined by Box and Jenkins. Marquardt’s method is used for the nonlinear least squares iterations. Numerical approximations of the derivatives of the sum-of-squares function are taken by using a fixed delta (controlled by the DELTA= option). The methods do not always converge successfully for a given set of data, particularly if the starting values for the parameters are not close to the least squares estimates.
Back-Forecasting The unconditional sum of squares is computed exactly; thus, back-forecasting is not performed. Early versions of SAS/ETS software used the back-forecasting approximation and allowed a positive value of the BACKLIM= option to control the extent of the back-forecasting. In the current version, requesting a positive number of back-forecasting steps with the BACKLIM= option has no effect.
Preliminary Estimation If an autoregressive or moving-average operator is specified with no missing lags, preliminary estimates of the parameters are computed by using the autocorrelations computed in the IDENTIFY stage. Otherwise, the preliminary estimates are arbitrarily set to values that produce stable polynomials. When preliminary estimation is not performed by PROC ARIMA, then initial values of the coefficients for any given autoregressive or moving-average factor are set to 0.1 if the degree of the polynomial associated with the factor is 9 or less. Otherwise, the coefficients are determined by expanding the polynomial (1 0:1B) to an appropriate power by using a recursive algorithm. These preliminary estimates are the starting values in an iterative algorithm to compute estimates of the parameters.
Estimation Methods Maximum Likelihood
The METHOD= ML option produces maximum likelihood estimates. The likelihood function is maximized via nonlinear least squares using Marquardt’s method. Maximum likelihood estimates are more expensive to compute than the conditional least squares estimates; however, they may be preferable in some cases (Ansley and Newbold 1980; Davidson 1981). The maximum likelihood estimates are computed as follows. Let the univariate ARMA model be .B/.Wt
t / D .B/at
where at is an independent sequence of normally distributed innovations with mean 0 and variance 2 . Here t is the mean parameter plus the transfer function inputs. The log-likelihood function
Estimation Details F 253
can be written as follows: 1 0 1 1 x x ln.jj/ 2 2 2
n ln. 2 / 2
In this equation, n is the number of observations, 2 is the variance of x as a function of the and parameters, and jj denotes the determinant. The vector x is the time series Wt minus the structural part of the model t , written as a column vector, as follows: 2 3 2 3 W1 1 6W2 7 62 7 6 7 6 7 xD6 : 7 6 : 7 4 :: 5 4 :: 5 Wn n The maximum likelihood estimate (MLE) of 2 is 1 s 2 D x0 1 x n Note that the default estimator of the variance divides by n r, where r is the number of parameters in the model, instead of by n. Specifying the NODF option causes a divisor of n to be used. The log-likelihood concentrated with respect to 2 can be taken up to additive constants as n ln.x0 2
1
x/
1 ln.jj/ 2
Let H be the lower triangular matrix with positive elements on the diagonal such that HH0 D . Let e be the vector H 1 x. The concentrated log-likelihood with respect to 2 can now be written as n ln.e0 e/ ln.jHj/ 2 or n ln.jHj1=n e0 ejHj1=n / 2 The MLE is produced by using a Marquardt algorithm to minimize the following sum of squares: jHj1=n e0 ejHj1=n The subsequent analysis of the residuals is done by using e as the vector of residuals. Unconditional Least Squares
The METHOD=ULS option produces unconditional least squares estimates. The ULS method is also referred to as the exact least squares (ELS) method. For METHOD=ULS, the estimates minimize n X t D1
aQ t2 D
n X
.xt
Ct Vt 1 .x1 ; ; xt
0 2 1/ /
t D1
where Ct is the covariance of xt and .x1 ; ; xt 1 /, and Vt is the variance matrix of Pn matrix 2 .x1 ; ; xt 1 /. In fact, t D1 aQ t is the same as x0 1 x, and hence e0 e. Therefore, the unconditional least squares estimates are obtained by minimizing the sum of squared residuals rather than using the log-likelihood as the criterion function.
254 F Chapter 7: The ARIMA Procedure
Conditional Least Squares
The METHOD=CLS option produces conditional least squares estimates. The CLS estimates are conditional on the assumption that the past unobserved errors are equal to 0. The series xt can be represented in terms of the previous observations, as follows: xt D at C
1 X
i xt
i
i D1
The weights are computed from the ratio of the and polynomials, as follows: .B/ D1 .B/
1 X
i B i
i D1
The CLS method produces estimates minimizing n X t D1
aO t2
D
n X
.xt
t D1
1 X
O i xt
2 i/
i D1
where the unobserved past values of xt are set to 0 and O i are computed from the estimates of and at each iteration. For METHOD=ULS and METHOD=ML, initial estimates are computed using the METHOD=CLS algorithm.
Start-up for Transfer Functions When computing the noise series for transfer function and intervention models, the start-up for the transferred variable is done by assuming that past values of the input series are equal to the first value of the series. The estimates are then obtained by applying least squares or maximum likelihood to the noise series. Thus, for transfer function models, the ML option does not generate the full (multivariate ARMA) maximum likelihood estimates, but it uses only the univariate likelihood function applied to the noise series. Because PROC ARIMA uses all of the available data for the input series to generate the noise series, other start-up options for the transferred series can be implemented by prefixing an observation to the beginning of the real data. For example, if you fit a transfer function model to the variable Y with the single input X, then you can employ a start-up using 0 for the past values by prefixing to the actual data an observation with a missing value for Y and a value of 0 for X.
Information Criteria PROC ARIMA computes and prints two information criteria, Akaike’s information criterion (AIC) (Akaike 1974; Harvey 1981) and Schwarz’s Bayesian criterion (SBC) (Schwarz 1978). The AIC and SBC are used to compare competing models fit to the same series. The model with the smaller information criteria is said to fit the data better. The AIC is computed as 2ln.L/ C 2k
Estimation Details F 255
where L is the likelihood function and k is the number of free parameters. The SBC is computed as 2ln.L/ C ln.n/k where n is the number of residuals that can be computed for the time series. Sometimes Schwarz’s Bayesian criterion is called the Bayesian information criterion (BIC). If METHOD=CLS is used to do the estimation, an approximation value of L is used, where L is based on the conditional sum of squares instead of the exact sum of squares, and a Jacobian factor is left out.
Tests of Residuals A table of test statistics for the hypothesis that the model residuals are white noise is printed as part of the ESTIMATE statement output. The chi-square statistics used in the test for lack of fit are computed using the Ljung-Box formula 2m
D n.n C 2/
m X kD1
rk2 .n
k/
where Pn k t D1 at at Ck rk D P n 2 t D1 at and at is the residual series. This formula has been suggested by Ljung and Box (1978) as yielding a better fit to the asymptotic chi-square distribution than the Box-Pierce Q statistic. Some simulation studies of the finite sample properties of this statistic are given by Davies, Triggs, and Newbold (1977) and by Ljung and Box (1978). When the time series has missing values, Stoffer and Toloi (1992) suggest a modification of this test statistic that has improved distributional properties over the standard Ljung-Box formula given above. When the series contains missing values, this modified test statistic is used by default. Each chi-square statistic is computed for all lags up to the indicated lag value and is not independent of the preceding chi-square values. The null hypotheses tested is that the current set of autocorrelations is white noise.
t-values The t values reported in the table of parameter estimates are approximations whose accuracy depends on the validity of the model, the nature of the model, and the length of the observed series. When the length of the observed series is short and the number of estimated parameters is large with respect to the series length, the t approximation is usually poor. Probability values that correspond to a t distribution should be interpreted carefully because they may be misleading.
256 F Chapter 7: The ARIMA Procedure
Cautions during Estimation The ARIMA procedure uses a general nonlinear least squares estimation method that can yield problematic results if your data do not fit the model. Output should be examined carefully. The GRID option can be used to ensure the validity and quality of the results. Problems you might encounter include the following: Preliminary moving-average estimates might not converge. If this occurs, preliminary estimates are derived as described previously in “Preliminary Estimation” on page 252. You can supply your own preliminary estimates with the ESTIMATE statement options. The estimates can lead to an unstable time series process, which can cause extreme forecast values or overflows in the forecast. The Jacobian matrix of partial derivatives might be singular; usually, this happens because not all the parameters are identifiable. Removing some of the parameters or using a longer time series might help. The iterative process might not converge. PROC ARIMA’s estimation method stops after n iterations, where n is the value of the MAXITER= option. If an iteration does not improve the SSE, the Marquardt parameter is increased by a factor of ten until parameters that have a smaller SSE are obtained or until the limit value of the Marquardt parameter is exceeded. For METHOD=CLS, the estimates might converge but not to least squares estimates. The estimates might converge to a local minimum, the numerical calculations might be distorted by data whose sum-of-squares surface is not smooth, or the minimum might lie outside the region of invertibility or stationarity. If the data are differenced and a moving-average model is fit, the parameter estimates might try to converge exactly on the invertibility boundary. In this case, the standard error estimates that are based on derivatives might be inaccurate.
Specifying Inputs and Transfer Functions Input variables and transfer functions for them can be specified using the INPUT= option in the ESTIMATE statement. The variables used in the INPUT= option must be included in the CROSSCORR= list in the previous IDENTIFY statement. If any differencing is specified in the CROSSCORR= list, then the differenced variable is used as the input to the transfer function.
General Syntax of the INPUT= Option The general syntax of the INPUT= option is ESTIMATE . . . INPUT=( transfer-function variable . . . )
The transfer function for an input variable is optional. The name of a variable by itself can be used to specify a pure regression term for the variable.
Specifying Inputs and Transfer Functions F 257
If specified, the syntax of the transfer function is S $ .L1;1 ; L1;2 ; : : :/.L2;1 ; : : :/: : :=.Li;1 ; Li;2 ; : : :/.Li C1;1 ; : : :/: : : S is the number of periods of time delay (lag) for this input series. Each term in parentheses specifies a polynomial factor with parameters at the lags specified by the Li;j values. The terms before the slash (/) are numerator factors. The terms after the slash (/) are denominator factors. All three parts are optional. Commas can optionally be used between input specifications to make the INPUT= option more readable. The $ sign after the shift is also optional. Except for the first numerator factor, each of the terms Li;1 ; Li;2 ; : : :; Li;k indicates a factor of the form .1
!i;1 B Li;1
!i;2 B Li;2
:::
!i;k B Li;k /
The form of the first numerator factor depends on the ALTPARM option. By default, the constant 1 in the first numerator factor is replaced with a free parameter !0 .
Alternative Model Parameterization When the ALTPARM option is specified, the !0 parameter is factored out so that it multiplies the entire transfer function, and the first numerator factor has the same form as the other factors. The ALTPARM option does not materially affect the results; it just presents the results differently. Some people prefer to see the model written one way, while others prefer the alternative representation. Table 7.9 illustrates the effect of the ALTPARM option. Table 7.9
The ALTPARM Option INPUT= Option INPUT=((1 2)(12)/(1)X);
ALTPARM No Yes
Model .!0 !1 B !2 B 2 /.1 !3 B 12 /=.1 ı1 B/Xt !0 .1 !1 B !2 B 2 /.1 !3 B 12 /=.1 ı1 B/Xt
Differencing and Input Variables If you difference the response series and use input variables, take care that the differencing operations do not change the meaning of the model. For example, if you want to fit the model Yt D
!0 .1 1 B/ Xt C at .1 ı1 B/ .1 B/.1 B 12 /
then the IDENTIFY statement must read identify var=y(1,12) crosscorr=x(1,12); estimate q=1 input=(/(1)x) noconstant;
258 F Chapter 7: The ARIMA Procedure
If instead you specify the differencing as identify var=y(1,12) crosscorr=x; estimate q=1 input=(/(1)x) noconstant;
then the model being requested is Yt D
.1
!0 ı1 B/.1 B/.1
B 12 /
Xt C
.1 1 B/ at .1 B/.1 B 12 /
which is a very different model. The point to remember is that a differencing operation requested for the response variable specified by the VAR= option is applied only to that variable and not to the noise term of the model.
Initial Values The syntax for giving initial values to transfer function parameters in the INITVAL= option parallels the syntax of the INPUT= option. For each transfer function in the INPUT= option, the INITVAL= option should give an initialization specification followed by the input series name. The initialization specification for each transfer function has the form C $ .V1;1 ; V1;2 ; : : :/.V2;1 ; : : :/: : :=.Vi;1 ; : : :/: : : where C is the lag 0 term in the first numerator factor of the transfer function (or the overall scale factor if the ALTPARM option is specified) and Vi;j is the coefficient of the Li;j element in the transfer function. To illustrate, suppose you want to fit the model Yt D C
.!0 !1 B !2 B 2 / Xt .1 ı1 B ı2 B 2 ı3 B 3 /
3
C
.1
1 at 1 B 2 B 3 /
and start the estimation process with the initial values =10, !0 =1, !1 =0.5, !2 =0.03, ı1 =0.8, ı2 =–0.1, ı3 =0.002, 1 =0.1, 2 =0.01. (These are arbitrary values for illustration only.) You would use the following statements: identify var=y crosscorr=x; estimate p=(1,3) input=(3$(1,2)/(1,2,3)x) mu=10 ar=.1 .01 initval=(1$(.5,.03)/(.8,-.1,.002)x);
Note that the lags specified for a particular factor are sorted, so initial values should be given in sorted order. For example, if the P= option had been entered as P=(3,1) instead of P=(1,3), the model would be the same and so would the AR= option. Sorting is done within all factors, including transfer function factors, so initial values should always be given in order of increasing lags.
Stationarity and Invertibility F 259
Here is another illustration, showing initialization for a factored model with multiple inputs. The model is Yt D C C
!1;0 Wt C .!2;0 !2;1 B/Xt .1 ı1;1 B/ 1 at .1 1 B/.1 2 B 6 3 B 12 /
3
and the initial values are =10, !1;0 =5, ı1;1 =0.8, !2;0 =1, !2;1 =0.5, 1 =0.1, 2 =0.05, and 3 =0.01. You would use the following statements: identify var=y crosscorr=(w x); estimate p=(1)(6,12) input=(/(1)w, 3$(1)x) mu=10 ar=.1 .05 .01 initval=(5$/(.8)w 1$(.5)x);
Stationarity and Invertibility By default, PROC ARIMA requires that the parameter estimates for the AR and MA parts of the model always remain in the stationary and invertible regions, respectively. The NOSTABLE option removes this restriction and for high-order models can save some computer time. Note that using the NOSTABLE option does not necessarily result in an unstable model being fit, since the estimates can leave the stable region for some iterations but still ultimately converge to stable values. Similarly, by default, the parameter estimates for the denominator polynomial of the transfer function part of the model are also restricted to be stable. The NOTFSTABLE option can be used to remove this restriction.
Naming of Model Parameters In the table of parameter estimates produced by the ESTIMATE statement, model parameters are referred to by using the naming convention described in this section. The parameters in the noise part of the model are named as ARi,j or MAi,j, where AR refers to autoregressive parameters and MA to moving-average parameters. The subscript i refers to the particular polynomial factor, and the subscript j refers to the jth term within the ith factor. These terms are sorted in order of increasing lag within factors, so the subscript j refers to the jth term after sorting. When inputs are used in the model, the parameters of each transfer function are named NUMi,j and DENi,j. The jth term in the ith factor of a numerator polynomial is named NUMi,j. The jth term in the ith factor of a denominator polynomial is named DENi,j. This naming process is repeated for each input variable, so if there are multiple inputs, parameters in transfer functions for different input series have the same name. The table of parameter estimates
260 F Chapter 7: The ARIMA Procedure
shows in the “Variable” column the input with which each parameter is associated. The parameter name shown in the “Parameter” column and the input variable name shown in the “Variable” column must be combined to fully identify transfer function parameters. The lag 0 parameter in the first numerator factor for the first input variable is named NUM1. For subsequent input variables, the lag 0 parameter in the first numerator factor is named NUMk, where k is the position of the input variable in the INPUT= option list. If the ALTPARM option is specified, the NUMk parameter is replaced by an overall scale parameter named SCALEk. For the mean and noise process parameters, the response series name is shown in the “Variable” column. The lag and shift for each parameter are also shown in the table of parameter estimates when inputs are used.
Missing Values and Estimation and Forecasting Estimation and forecasting are carried out in the presence of missing values by forecasting the missing values with the current set of parameter estimates. The maximum likelihood algorithm employed was suggested by Jones (1980) and is used for both unconditional least squares (ULS) and maximum likelihood (ML) estimation. The CLS algorithm simply fills in missing values with infinite memory forecast values, computed by forecasting ahead from the nonmissing past values as far as required by the structure of the missing values. These artificial values are then employed in the nonmissing value CLS algorithm. Artificial values are updated at each iteration along with parameter estimates. For models with input variables, embedded missing values (that is, missing values other than at the beginning or end of the series) are not generally supported. Embedded missing values in input variables are supported for the special case of a multiple regression model that has ARIMA errors. A multiple regression model is specified by an INPUT= option that simply lists the input variables (possibly with lag shifts) without any numerator or denominator transfer function factors. One-stepahead forecasts are not available for the response variable when one or more of the input variables have missing values. When embedded missing values are present for a model with complex transfer functions, PROC ARIMA uses the first continuous nonmissing piece of each series to do the analysis. That is, PROC ARIMA skips observations at the beginning of each series until it encounters a nonmissing value and then uses the data from there until it encounters another missing value or until the end of the data is reached. This makes the current version of PROC ARIMA compatible with earlier releases that did not allow embedded missing values.
Forecasting Details If the model has input variables, a forecast beyond the end of the data for the input variables is possible only if univariate ARIMA models have previously been fit to the input variables or future values for the input variables are included in the DATA= data set.
Forecasting Details F 261
If input variables are used, the forecast standard errors and confidence limits of the response depend on the estimated forecast error variance of the predicted inputs. If several input series are used, the forecast errors for the inputs should be independent; otherwise, the standard errors and confidence limits for the response series will not be accurate. If future values for the input variables are included in the DATA= data set, the standard errors of the forecasts will be underestimated since these values are assumed to be known with certainty. The forecasts are generated using forecasting equations consistent with the method used to estimate the model parameters. Thus, the estimation method specified in the ESTIMATE statement also controls the way forecasts are produced by the FORECAST statement. If METHOD=CLS is used, the forecasts are infinite memory forecasts, also called conditional forecasts. If METHOD=ULS or METHOD=ML, the forecasts are finite memory forecasts, also called unconditional forecasts. A complete description of the steps to produce the series forecasts and their standard errors by using either of these methods is quite involved, and only a brief explanation of the algorithm is given in the next two sections. Additional details about the finite and infinite memory forecasts can be found in Brockwell and Davis (1991). The prediction of stationary ARMA processes is explained in Chapter 5, and the prediction of nonstationary ARMA processes is given in Chapter 9 of Brockwell and Davis (1991).
Infinite Memory Forecasts If METHOD=CLS is used, the forecasts are infinite memory forecasts, also called conditional forecasts. The term conditional is used because the forecasts are computed by assuming that the unknown values of the response series before the start of the data are equal to the mean of the series. Thus, the forecasts are conditional on this assumption. The series xt can be represented as xt D at C
1 X
i xt
i
i D1
where .B/=.B/ D 1
P1
i D1 i B
i.
The k -step forecast of xtCk is computed as xO t Ck D
k X1
O i xO t Ck
i D1
i
C
1 X
O i xt Ck
i
i Dk
where unobserved past values of xt are set to zero and O i is obtained from the estimated parameters O O and .
Finite Memory Forecasts For METHOD=ULS or METHOD=ML, the forecasts are finite memory forecasts, also called unconditional forecasts. For finite memory forecasts, the covariance function of the ARMA model is used to derive the best linear prediction equation.
262 F Chapter 7: The ARIMA Procedure
That is, the k-step forecast of xt Ck , given .x1 ; ; xt xQ tCk D Ck;t Vt 1 .x1 ; ; xt
1 /,
is
0 1/
where Ck;t is the covariance of xt Ck and .x1 ; ; xt 1 / and Vt is the covariance matrix of the vector .x1 ; ; xt 1 /. Ck;t and Vt are derived from the estimated parameters. Finite memory forecasts minimize the mean squared error of prediction if the parameters of the ARMA model are known exactly. (In most cases, the parameters of the ARMA model are estimated, so the predictors are not true best linear forecasts.) If the response series is differenced, the final forecast is produced by summing the forecast of the differenced series. This summation and the forecast are conditional on the initial values of the series. Thus, when the response series is differenced, the final forecasts are not true finite memory forecasts because they are derived by assuming that the differenced series begins in a steady-state condition. Thus, they fall somewhere between finite memory and infinite memory forecasts. In practice, there is seldom any practical difference between these forecasts and true finite memory forecasts.
Forecasting Log Transformed Data The log transformation is often used to convert time series that are nonstationary with respect to the innovation variance into stationary time series. The usual approach is to take the log of the series in a DATA step and then apply PROC ARIMA to the transformed data. A DATA step is then used to transform the forecasts of the logs back to the original units of measurement. The confidence limits are also transformed by using the exponential function. As one alternative, you can simply exponentiate the forecast series. This procedure gives a forecast for the median of the series, but the antilog of the forecast log series underpredicts the mean of the original series. If you want to predict the expected value of the series, you need to take into account the standard error of the forecast, as shown in the following example, which uses an AR(2) model to forecast the log of a series Y: data in; set in; ylog = log( y ); run; proc arima data=in; identify var=ylog; estimate p=2; forecast lead=10 out=out; run; data out; set out; y = exp( l95 = exp( u95 = exp( forecast = run;
ylog ); l95 ); u95 ); exp( forecast + std*std/2 );
Specifying Series Periodicity F 263
Specifying Series Periodicity The INTERVAL= option is used together with the ID= variable to describe the observations that make up the time series. For example, INTERVAL=MONTH specifies a monthly time series in which each observation represents one month. See Chapter 4, “Date Intervals, Formats, and Functions,” for details about the interval values supported. The variable specified by the ID= option in the PROC ARIMA statement identifies the time periods associated with the observations. Usually, SAS date, time, or datetime values are used for this variable. PROC ARIMA uses the ID= variable in the following ways: to validate the data periodicity. When the INTERVAL= option is specified, PROC ARIMA uses the ID variable to check the data and verify that successive observations have valid ID values that correspond to successive time intervals. When the INTERVAL= option is not used, PROC ARIMA verifies that the ID values are nonmissing and in ascending order. to check for gaps in the input observations. For example, if INTERVAL=MONTH and an input observation for April 1970 follows an observation for January 1970, there is a gap in the input data with two omitted observations (namely February and March 1970). A warning message is printed when a gap in the input data is found. to label the forecast observations in the output data set. PROC ARIMA extrapolates the values of the ID variable for the forecast observations from the ID value at the end of the input data according to the frequency specifications of the INTERVAL= option. If the INTERVAL= option is not specified, PROC ARIMA extrapolates the ID variable by incrementing the ID variable value for the last observation in the input data by 1 for each forecast period. Values of the ID variable over the range of the input data are copied to the output data set. The ALIGN= option is used to align the ID variable to the beginning, middle, or end of the time ID interval specified by the INTERVAL= option.
Detecting Outliers You can use the OUTLIER statement to detect changes in the level of the response series that are not accounted for by the estimated model. The types of changes considered are additive outliers (AO), level shifts (LS), and temporary changes (TC). Let t be a regression variable that describes some type of change in the mean response. In time series literature t is called a shock signature. An additive outlier at some time point s corresponds to a shock signature t such that s D 1:0 and t is 0.0 at all other points. Similarly a permanent level shift that originates at time s has a shock signature such that t is 0.0 for t < s and 1.0 for t s. A temporary level shift of duration d that originates at time s has t equal to 1.0 between s and s C d and 0.0 otherwise.
264 F Chapter 7: The ARIMA Procedure
Suppose that you are estimating the ARIMA model D.B/Yt D t C
.B/ at .B/
where Yt is the response series, D.B/ is the differencing polynomial in the backward shift operator B (possibly identity), t is the transfer function input, .B/ and .B/ are the AR and MA polynomials, respectively, and at is the Gaussian white noise series. The problem of detection of level shifts in the OUTLIER statement is formulated as a problem of sequential selection of shock signatures that improve the model in the ESTIMATE statement. This is similar to the forward selection process in the stepwise regression procedure. The selection process starts with considering shock signatures of the type specified in the TYPE= option, originating at each nonmissing measurement. This involves testing H0 W ˇ D 0 versus Ha W ˇ ¤ 0 in the model D.B/.Yt
ˇt / D t C
.B/ at .B/
for each of these shock signatures. The most significant shock signature, if it also satisfies the significance criterion in ALPHA= option, is included in the model. If no significant shock signature is found, then the outlier detection process stops; otherwise this augmented model, which incorporates the selected shock signature in its transfer function input, becomes the null model for the subsequent selection process. This iterative process stops if at any stage no more significant shock signatures are found or if the number of iterations exceeds the maximum search number that results due to the MAXNUM= and MAXPCT= settings. In all these iterations, the parameters of the ARIMA model in the ESTIMATE statement are held fixed. The precise details of the testing procedure for a given shock signature t are as follows: The preceding testing problem is equivalent to testing H0 W ˇ D 0 versus Ha W ˇ ¤ 0 in the following “regression with ARMA errors” model Nt D ˇt C
.B/ at .B/
where Nt D .D.B/Yt signature.
t / is the “noise” process and t D D.B/t is the “effective” shock
In this setting, under H0 ; N D .N1 ; N2 ; : : : ; Nn /T is a mean zero Gaussian vector with variance covariance matrix 2 . Here 2 is the variance of the white noise process at and is the variancecovariance matrix associated with the ARMA model. Moreover, under Ha , N has ˇ as the mean vector where D .1 ; 2 ; : : : ; n /T . Additionally, the generalized least squares estimate of ˇ and its variance is given by ˇO D ı= O D 2 = Var.ˇ/ where ı D T 1 N and D T 1 . The test statistic 2 D ı 2 =. 2 / is used to test the significance of ˇ, which has an approximate chi-squared distribution with 1 degree of freedom under H0 . The type of estimate of 2 used in the calculation of 2 can be specified by the SIGMA= option. The default setting is SIGMA=ROBUST, which corresponds to a robust estimate suggested in an
OUT= Data Set F 265
outlier detection procedure in X-12-ARIMA, the Census Bureau’s time series analysis program; see Findley et al. (1998) for additional information. The robust estimate of 2 is computed by the formula O 2 D .1:49 Median.jaO t j//2 where aO t are the standardized residuals of the null ARIMA model. The setting SIGMA=MSE corresponds to the usual mean squared error estimate (MSE) computed the same way as in the ESTIMATE statement with the NODF option. The quantities ı and are efficiently computed by a method described in de Jong and Penzer (1998); see also Kohn and Ansley (1985).
Modeling in the Presence of Outliers In practice, modeling and forecasting time series data in the presence of outliers is a difficult problem for several reasons. The presence of outliers can adversely affect the model identification and estimation steps. Their presence close to the end of the observation period can have a serious impact on the forecasting performance of the model. In some cases, level shifts are associated with changes in the mechanism that drives the observation process, and separate models might be appropriate to different sections of the data. In view of all these difficulties, diagnostic tools such as outlier detection and residual analysis are essential in any modeling process. The following modeling strategy, which incorporates level shift detection in the familiar Box-Jenkins modeling methodology, seems to work in many cases: 1. Proceed with model identification and estimation as usual. Suppose this results in a tentative ARIMA model, say M. 2. Check for additive and permanent level shifts unaccounted for by the model M by using the OUTLIER statement. In this step, unless there is evidence to justify it, the number of level shifts searched should be kept small. 3. Augment the original dataset with the regression variables that correspond to the detected outliers. 4. Include the first few of these regression variables in M, and call this model M1. Reestimate all the parameters of M1. It is important not to include too many of these outlier variables in the model in order to avoid the danger of over-fitting. 5. Check the adequacy of M1 by examining the parameter estimates, residual analysis, and outlier detection. Refine it more if necessary.
OUT= Data Set The output data set produced by the OUT= option of the PROC ARIMA or FORECAST statements contains the following:
266 F Chapter 7: The ARIMA Procedure
the BY variables the ID variable the variable specified by the VAR= option in the IDENTIFY statement, which contains the actual values of the response series FORECAST, a numeric variable that contains the one-step-ahead predicted values and the multistep forecasts STD, a numeric variable that contains the standard errors of the forecasts a numeric variable that contains the lower confidence limits of the forecast. This variable is named L95 by default but has a different name if the ALPHA= option specifies a different size for the confidence limits. RESIDUAL, a numeric variable that contains the differences between actual and forecast values a numeric variable that contains the upper confidence limits of the forecast. This variable is named U95 by default but has a different name if the ALPHA= option specifies a different size for the confidence limits. The ID variable, the BY variables, and the response variable are the only ones copied from the input to the output data set. In particular, the input variables are not copied to the OUT= data set. Unless the NOOUTALL option is specified, the data set contains the whole time series. The FORECAST variable has the one-step forecasts (predicted values) for the input periods, followed by n forecast values, where n is the LEAD= value. The actual and RESIDUAL values are missing beyond the end of the series. If you specify the same OUT= data set in different FORECAST statements, the latter FORECAST statements overwrite the output from the previous FORECAST statements. If you want to combine the forecasts from different FORECAST statements in the same output data set, specify the OUT= option once in the PROC ARIMA statement and omit the OUT= option in the FORECAST statements. When a global output data set is created by the OUT= option in the PROC ARIMA statement, the variables in the OUT= data set are defined by the first FORECAST statement that is executed. The results of subsequent FORECAST statements are vertically concatenated onto the OUT= data set. Thus, if no ID variable is specified in the first FORECAST statement that is executed, no ID variable appears in the output data set, even if one is specified in a later FORECAST statement. If an ID variable is specified in the first FORECAST statement that is executed but not in a later FORECAST statement, the value of the ID variable is the same as the last value processed for the ID variable for all observations created by the later FORECAST statement. Furthermore, even if the response variable changes in subsequent FORECAST statements, the response variable name in the output data set is that of the first response variable analyzed.
OUTCOV= Data Set F 267
OUTCOV= Data Set The output data set produced by the OUTCOV= option of the IDENTIFY statement contains the following variables: LAG, a numeric variable that contains the lags that correspond to the values of the covariance variables. The values of LAG range from 0 to N for covariance functions and from –N to N for cross-covariance functions, where N is the value of the NLAG= option. VAR, a character variable that contains the name of the variable specified by the VAR= option. CROSSVAR, a character variable that contains the name of the variable specified in the CROSSCORR= option, which labels the different cross-covariance functions. The CROSSVAR variable is blank for the autocovariance observations. When there is no CROSSCORR= option, this variable is not created. N, a numeric variable that contains the number of observations used to calculate the current value of the covariance or cross-covariance function. COV, a numeric variable that contains the autocovariance or cross-covariance function values. COV contains the autocovariances of the VAR= variable when the value of the CROSSVAR variable is blank. Otherwise COV contains the cross covariances between the VAR= variable and the variable named by the CROSSVAR variable. CORR, a numeric variable that contains the autocorrelation or cross-correlation function values. CORR contains the autocorrelations of the VAR= variable when the value of the CROSSVAR variable is blank. Otherwise CORR contains the cross-correlations between the VAR= variable and the variable named by the CROSSVAR variable. STDERR, a numeric variable that contains the standard errors of the autocorrelations. The standard error estimate is based on the hypothesis that the process that generates the time series is a pure moving-average process of order LAG–1. For the cross-correlations, STDERR p contains the value 1= n, which approximates the standard error under the hypothesis that the two series are uncorrelated. INVCORR, a numeric variable that contains the inverse autocorrelation function values of the VAR= variable. For cross-correlation observations (that is, when the value of the CROSSVAR variable is not blank), INVCORR contains missing values. PARTCORR, a numeric variable that contains the partial autocorrelation function values of the VAR= variable. For cross-correlation observations (that is, when the value of the CROSSVAR variable is not blank), PARTCORR contains missing values.
OUTEST= Data Set PROC ARIMA writes the parameter estimates for a model to an output data set when the OUTEST= option is specified in the ESTIMATE statement. The OUTEST= data set contains the following:
268 F Chapter 7: The ARIMA Procedure
the BY variables _MODLABEL_, a character variable that contains the model label, if it is provided by using the label option in the ESTIMATE statement (otherwise this variable is not created). _NAME_, a character variable that contains the name of the parameter for the covariance or correlation observations or is blank for the observations that contain the parameter estimates. (This variable is not created if neither OUTCOV nor OUTCORR is specified.) _TYPE_, a character variable that identifies the type of observation. A description of the _TYPE_ variable values is given below. variables for model parameters The variables for the model parameters are named as follows: ERRORVAR
This numeric variable contains the variance estimate. The _TYPE_=EST observation for this variable contains the estimated error variance, and the remaining observations are missing.
MU
This numeric variable contains values for the mean parameter for the model. (This variable is not created if NOCONSTANT is specified.)
MAj _k
These numeric variables contain values for the moving-average parameters. The variables for moving-average parameters are named MAj _k, where j is the factor-number and k is the index of the parameter within a factor.
ARj _k
These numeric variables contain values for the autoregressive parameters. The variables for autoregressive parameters are named ARj _k, where j is the factor number and k is the index of the parameter within a factor.
Ij _k
These variables contain values for the transfer function parameters. Variables for transfer function parameters are named Ij _k, where j is the number of the INPUT variable associated with the transfer function component and k is the number of the parameter for the particular INPUT variable. INPUT variables are numbered according to the order in which they appear in the INPUT= list.
_STATUS_
This variable describes the convergence status of the model. A value of 0_CONVERGED indicates that the model converged.
The value of the _TYPE_ variable for each observation indicates the kind of value contained in the variables for model parameters for the observation. The OUTEST= data set contains observations with the following _TYPE_ values: EST
The observation contains parameter estimates.
STD
The observation contains approximate standard errors of the estimates.
CORR
The observation contains correlations of the estimates. OUTCORR must be specified to get these observations.
COV
The observation contains covariances of the estimates. OUTCOV must be specified to get these observations.
OUTEST= Data Set F 269
FACTOR
The observation contains values that identify for each parameter the factor that contains it. Negative values indicate denominator factors in transfer function models.
LAG
The observation contains values that identify the lag associated with each parameter.
SHIFT
The observation contains values that identify the shift associated with the input series for the parameter.
The values given for _TYPE_=FACTOR, _TYPE_=LAG, or _TYPE_=SHIFT observations enable you to reconstruct the model employed when provided with only the OUTEST= data set.
OUTEST= Examples This section clarifies how model parameters are stored in the OUTEST= data set with two examples. Consider the following example: proc arima data=input; identify var=y cross=(x1 x2); estimate p=(1)(6) q=(1,3)(12) input=(x1 x2) outest=est; run; proc print data=est; run;
The model specified by these statements is Yt D C !1;0 X1;t C !2;0 X2;t C
.1
11 B 12 B 3 /.1 21 B 12 / at .1 11 B/.1 21 B 6 /
The OUTEST= data set contains the values shown in Table 7.10. Table 7.10 Obs 1 2 3 4 5
OUTEST= Data Set for First Example _TYPE_ EST STD FACTOR LAG SHIFT
Y 2 . . . .
MU se 0 0 0
MA1_1 11 se 11 1 1 0
MA1_2 12 se 12 1 3 0
MA2_1 21 se 21 2 12 0
AR1_1 11 se 11 1 1 0
AR2_1 21 se 21 2 6 0
I1_1 !1;0 se !1;0 1 0 0
I2_1 !2;0 se !2;0 1 0 0
Note that the symbols in the rows for _TYPE_=EST and _TYPE_=STD in Table 7.10 would be numeric values in a real data set. Next, consider the following example: proc arima data=input; identify var=y cross=(x1 x2);
270 F Chapter 7: The ARIMA Procedure
estimate p=1 q=1 input=(2 $ (1)/(1,2)x1 1 $ /(1)x2) outest=est; run; proc print data=est; run;
The model specified by these statements is Yt D C
!10 !11 B X1;t 1 ı11 B ı12 B 2
2
C
!20 X2;t 1 ı21 B
1
C
.1 .1
1 B/ at 1 B/
The OUTEST= data set contains the values shown in Table 7.11. Table 7.11 Obs 1 2 3 4 5
OUTEST= Data Set for Second Example _TYPE_ EST STD FACTOR LAG SHIFT
Y 2 . . . .
MU se 0 0 0
MA1_1 1 se 1 1 1 0
AR1_1 1 se 1 1 1 0
I1_1 !10 se !10 1 0 2
I1_2 !11 se !11 1 1 2
I1_3 ı11 se ı11 -1 1 2
I1_4 ı12 se ı12 -1 2 2
I2_1 !20 se !20 1 0 1
I2_2 ı21 se ı21 -1 1 1
OUTMODEL= SAS Data Set The OUTMODEL= option in the ESTIMATE statement writes an output data set that enables you to reconstruct the model. The OUTMODEL= data set contains much the same information as the OUTEST= data set but in a transposed form that might be more useful for some purposes. In addition, the OUTMODEL= data set includes the differencing operators. The OUTMODEL data set contains the following: the BY variables _MODLABEL_, a character variable that contains the model label, if it is provided by using the label option in the ESTIMATE statement (otherwise this variable is not created). _NAME_, a character variable that contains the name of the response or input variable for the observation. _TYPE_, a character variable that contains the estimation method that was employed. The value of _TYPE_ can be CLS, ULS, or ML. _STATUS_, a character variable that describes the convergence status of the model. A value of 0_CONVERGED indicates that the model converged. _PARM_, a character variable that contains the name of the parameter given by the observation. _PARM_ takes on the values ERRORVAR, MU, AR, MA, NUM, DEN, and DIF.
OUTMODEL= SAS Data Set F 271
_VALUE_, a numeric variable that contains the value of the estimate defined by the _PARM_ variable. _STD_, a numeric variable that contains the standard error of the estimate. _FACTOR_, a numeric variable that indicates the number of the factor to which the parameter belongs. _LAG_, a numeric variable that contains the number of the term within the factor that contains the parameter. _SHIFT_, a numeric variable that contains the shift value for the input variable associated with the current parameter. The values of _FACTOR_ and _LAG_ identify which particular MA, AR, NUM, or DEN parameter estimate is given by the _VALUE_ variable. The _NAME_ variable contains the response variable name for the MU, AR, or MA parameters. Otherwise, _NAME_ contains the input variable name associated with NUM or DEN parameter estimates. The _NAME_ variable contains the appropriate variable name associated with the current DIF observation as well. The _VALUE_ variable is 1 for all DIF observations, and the _LAG_ variable indicates the degree of differencing employed. The observations contained in the OUTMODEL= data set are identified by the _PARM_ variable. A description of the values of the _PARM_ variable follows: NUMRESID
_VALUE_ contains the number of residuals.
NPARMS
_VALUE_ contains the number of parameters in the model.
NDIFS
_VALUE_ contains the sum of the differencing lags employed for the response variable.
ERRORVAR
_VALUE_ contains the estimate of the innovation variance.
MU
_VALUE_ contains the estimate of the mean term.
AR
_VALUE_ contains the estimate of the autoregressive parameter indexed by the _FACTOR_ and _LAG_ variable values.
MA
_VALUE_ contains the estimate of a moving-average parameter indexed by the _FACTOR_ and _LAG_ variable values.
NUM
_VALUE_ contains the estimate of the parameter in the numerator factor of the transfer function of the input variable indexed by the _FACTOR_, _LAG_, and _SHIFT_ variable values.
DEN
_VALUE_ contains the estimate of the parameter in the denominator factor of the transfer function of the input variable indexed by the _FACTOR_, _LAG_, and _SHIFT_ variable values.
DIF
_VALUE_ contains the difference operator defined by the difference lag given by the value in the _LAG_ variable.
272 F Chapter 7: The ARIMA Procedure
OUTSTAT= Data Set PROC ARIMA writes the diagnostic statistics for a model to an output data set when the OUTSTAT= option is specified in the ESTIMATE statement. The OUTSTAT data set contains the following: the BY variables. _MODLABEL_, a character variable that contains the model label, if it is provided by using the label option in the ESTIMATE statement (otherwise this variable is not created). _TYPE_, a character variable that contains the estimation method used. _TYPE_ can have the value CLS, ULS, or ML. _STAT_, a character variable that contains the name of the statistic given by the _VALUE_ variable in this observation. _STAT_ takes on the values AIC, SBC, LOGLIK, SSE, NUMRESID, NPARMS, NDIFS, ERRORVAR, MU, CONV, and NITER. _VALUE_, a numeric variable that contains the value of the statistic named by the _STAT_ variable. The observations contained in the OUTSTAT= data set are identified by the _STAT_ variable. A description of the values of the _STAT_ variable follows: AIC
Akaike’s information criterion
SBC
Schwarz’s Bayesian criterion
LOGLIK
the log-likelihood, if METHOD=ML or METHOD=ULS is specified
SSE
the sum of the squared residuals
NUMRESID
the number of residuals
NPARMS
the number of parameters in the model
NDIFS
the sum of the differencing lags employed for the response variable
ERRORVAR
the estimate of the innovation variance
MU
the estimate of the mean term
CONV
tells if the estimation converged. The value of 0 signifies that estimation converged. Nonzero values reflect convergence problems.
NITER
the number of iterations
Remark. CONV takes an integer value that corresponds to the error condition of the parameter estimation process. The value of 0 signifies that estimation process has converged. The higher values signify convergence problems of increasing severity. Specifically: CONV D 0 indicates that the estimation process has converged. CONV D 1 or 2 indicates that the estimation process has run into numerical problems (such as encountering an unstable model or a ridge) during the iterations. CONV >D 3 indicates that the estimation process has failed to converge.
Printed Output F 273
Printed Output The ARIMA procedure produces printed output for each of the IDENTIFY, ESTIMATE, and FORECAST statements. The output produced by each ARIMA statement is described in the following sections. If ODS Graphics is enabled, the line printer plots mentioned below are replaced by the corresponding ODS plots.
IDENTIFY Statement Printed Output The printed output of the IDENTIFY statement consists of the following: a table of summary statistics, including the name of the response variable, any specified periods of differencing, the mean and standard deviation of the response series after differencing, and the number of observations after differencing a plot of the sample autocorrelation function for lags up to and including the NLAG= option value. Standard errors of the autocorrelations also appear to the right of the autocorrelation plot if the value of LINESIZE= option is sufficiently large. The standard errors are derived using Bartlett’s approximation (Box and Jenkins 1976, p. 177). The approximation for a standard error for the estimated autocorrelation function at lag k is based on a null hypothesis that a pure moving-average Gaussian process of order k–1 generated the time series. The relative position of an approximate 95% confidence interval under this null hypothesis is indicated by the dots in the plot, while the asterisks represent the relative magnitude of the autocorrelation value. a plot of the sample inverse autocorrelation function. See the section “The Inverse Autocorrelation Function” on page 243 for more information about the inverse autocorrelation function. a plot of the sample partial autocorrelation function a table of test statistics for the hypothesis that the series is white noise. These test statistics are the same as the tests for white noise residuals produced by the ESTIMATE statement and are described in the section “Estimation Details” on page 252. a plot of the sample cross-correlation function for each series specified in the CROSSCORR= option. If a model was previously estimated for a variable in the CROSSCORR= list, the cross-correlations for that series are computed for the prewhitened input and response series. For each input variable with a prewhitening filter, the cross-correlation report for the input series includes the following: – a table of test statistics for the hypothesis of no cross-correlation between the input and response series – the prewhitening filter used for the prewhitening transformation of the predictor and response variables ESACF tables if the ESACF option is used
274 F Chapter 7: The ARIMA Procedure
MINIC table if the MINIC option is used SCAN table if the SCAN option is used STATIONARITY test results if the STATIONARITY option is used
ESTIMATE Statement Printed Output The printed output of the ESTIMATE statement consists of the following: if the PRINTALL option is specified, the preliminary parameter estimates and an iteration history that shows the sequence of parameter estimates tried during the fitting process a table of parameter estimates that show the following for each parameter: the parameter name, the parameter estimate, the approximate standard error, t value, approximate probability (P r > jtj), the lag for the parameter, the input variable name for the parameter, and the lag or “Shift” for the input variable the estimates of the constant term, the innovation variance (variance estimate), the innovation standard deviation (Std Error Estimate), Akaike’s information criterion (AIC), Schwarz’s Bayesian criterion (SBC), and the number of residuals the correlation matrix of the parameter estimates a table of test statistics for hypothesis that the residuals of the model are white noise. The table is titled “Autocorrelation Check of Residuals.” if the PLOT option is specified, autocorrelation, inverse autocorrelation, and partial autocorrelation function plots of the residuals if an INPUT variable has been modeled in such a way that prewhitening is performed in the IDENTIFY step, a table of test statistics titled “Crosscorrelation Check of Residuals.” The test statistic is based on the chi-square approximation suggested by Box and Jenkins (1976, pp. 395–396). The cross-correlation function is computed by using the residuals from the model as one series and the prewhitened input variable as the other series. if the GRID option is specified, the sum-of-squares or likelihood surface over a grid of parameter values near the final estimates a summary of the estimated model that shows the autoregressive factors, moving-average factors, and transfer function factors in backshift notation with the estimated parameter values.
OUTLIER Statement Printed Output The printed output of the OUTLIER statement consists of the following: a summary that contains the information about the maximum number of outliers searched, the number of outliers actually detected, and the significance level used in the outlier detection.
ODS Table Names F 275
a table that contains the results of the outlier detection process. The outliers are listed in the order in which they are found. This table contains the following columns: – The Obs column contains the observation number of the start of the level shift. – If an ID= option is specified, then the Time ID column contains the time identification labels of the start of the outlier. – The Type column lists the type of the outlier. O the estimate of the regression coefficient of the shock – The Estimate column contains ˇ, signature. – The Chi-Square column lists the value of the test statistic 2 . – The Approx Prob > ChiSq column lists the approximate p-value of the test statistic.
FORECAST Statement Printed Output The printed output of the FORECAST statement consists of the following: a summary of the estimated model a table of forecasts with following columns: – The Obs column contains the observation number. – The Forecast column contains the forecast values. – The Std Error column contains the forecast standard errors. – The Lower and Uppers columns contain the approximate 95% confidence limits. The ALPHA= option can be used to change the confidence interval for forecasts. – If the PRINTALL option is specified, the forecast table also includes columns for the actual values of the response series (Actual) and the residual values (Residual).
ODS Table Names PROC ARIMA assigns a name to each table it creates. You can use these names to reference the table when you use the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in Table 7.12. Table 7.12
ODS Tables Produced by PROC ARIMA
ODS Table Name
Description
Statement
ChiSqAuto
chi-square statistics table for autocorrelation chi-square statistics table for cross-correlations
IDENTIFY
ChiSqCross
IDENTIFY
Option
CROSSCORR
276 F Chapter 7: The ARIMA Procedure
Table 7.12
continued
ODS Table Name
Description
Statement
Option
CorrGraph DescStats ESACF
Correlations graph Descriptive statistics Extended sample autocorrelation function ESACF probability values Inverse autocorrelations graph Input descriptive statistics Minimum information criterion Partial autocorrelations graph Squared canonical correlation estimates SCAN chi-square probability values Stationarity tests Tentative order selection Filter equations chi-square statistics table for autocorrelation chi-square statistics table for cross-correlations Correlations of the estimates Filter equations Fit statistics Iteration history Initial autoregressive parameter estimates Initial moving-average parameter estimates Input description Filter equations Model description Filter equations Parameter estimates Preliminary estimates Objective function grid matrix ARIMA estimation optimization Detected outliers Forecast
IDENTIFY IDENTIFY IDENTIFY
ESACF
ESACFPValues IACFGraph InputDescStats MINIC PACFGraph SCAN SCANPValues StationarityTests TentativeOrders ARPolynomial ChiSqAuto ChiSqCross CorrB DenPolynomial FitStatistics IterHistory InitialAREstimates InitialMAEstimates InputDescription MAPolynomial ModelDescription NumPolynomial ParameterEstimates PrelimEstimates ObjectiveGrid OptSummary OutlierDetails Forecasts
IDENTIFY IDENTIFY
ESACF
IDENTIFY IDENTIFY
MINIC
IDENTIFY IDENTIFY
SCAN
IDENTIFY
SCAN
IDENTIFY IDENTIFY ESTIMATE ESTIMATE
STATIONARITY MINIC, ESACF, or SCAN
ESTIMATE ESTIMATE ESTIMATE ESTIMATE ESTIMATE ESTIMATE
PRINTALL
ESTIMATE ESTIMATE ESTIMATE ESTIMATE ESTIMATE ESTIMATE ESTIMATE ESTIMATE
GRID
ESTIMATE
PRINTALL
OUTLIER FORECAST
Statistical Graphics F 277
Statistical Graphics This section provides information about the basic ODS statistical graphics produced by the ARIMA procedure. To request graphics with PROC ARIMA, you must first enable ODS Graphics by specifying the ODS GRAPHICS ON; statement. See Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide), for more information. The main types of plots available are as follows: plots useful in the trend and correlation analysis of the dependent and input series plots useful for the residual analysis of an estimated model forecast plots You can obtain most plots relevant to the specified model by default if ODS Graphics is enabled. For finer control of the graphics, you can use the PLOTS= option in the PROC ARIMA statement. The following example is a simple illustration of how to use the PLOTS= option.
Airline Series: Illustration of ODS Graphics The series in this example, the monthly airline passenger series, is also discussed later, in Example 7.2. The following statements specify an ARIMA(0,1,1)(0,1,1)12 model without a mean term to the logarithms of the airline passengers series, xlog. Notice the use of the global plot option ONLY in the PLOTS= option of the PROC ARIMA statement. It suppresses the production of default graphics and produces only the plots specified by the subsequent RESIDUAL and FORECAST plot options. The RESIDUAL(SMOOTH) plot specification produces a time series plot of residuals that has an overlaid loess fit; see Figure 7.21. The FORECAST(FORECAST) option produces a plot that shows the one-step-ahead forecasts, as well as the multistep-ahead forecasts; see Figure 7.22. ods graphics on; proc arima data=seriesg plots(only)=(residual(smooth) forecast(forecasts)); identify var=xlog(1,12); estimate q=(1)(12) noint method=ml; forecast id=date interval=month; run;
278 F Chapter 7: The ARIMA Procedure
Figure 7.21 Residual Plot of the Airline Model
Statistical Graphics F 279
Figure 7.22 Forecast Plot of the Airline Model
ODS Graph Names PROC ARIMA assigns a name to each graph it creates by using ODS. You can use these names to reference the graphs when you use ODS. The names are listed in Table 7.13. Table 7.13
ODS Graphics Produced by PROC ARIMA
ODS Graph Name
Plot Description
Option
SeriesPlot
Time series plot of the dependent series Autocorrelation plot of the dependent series Partial-autocorrelation plot of the dependent series Inverse-autocorrelation plot of the dependent series Series trend and correlation analysis panel
PLOTS(UNPACK)
SeriesACFPlot SeriesPACFPlot SeriesIACFPlot SeriesCorrPanel
PLOTS(UNPACK) PLOTS(UNPACK) PLOTS(UNPACK) Default
280 F Chapter 7: The ARIMA Procedure
Table 7.13
continued
ODS Graph Name
Plot Description
Option
CrossCorrPanel
Cross-correlation plots, either individual or paneled. They are numbered 1, 2, and so on as needed. Residual-autocorrelation plot
Default
PLOTS(UNPACK)
ResidualHistogram
Residual-partialautocorrelation plot Residual-inverseautocorrelation plot Residual-white-noiseprobability plot Residual histogram
ResidualQQPlot
Residual normal Q-Q Plot
PLOTS(UNPACK)
ResidualPlot
Time series plot of residuals with a superimposed smoother Time series plot of multistep forecasts Time series plot of one-step-ahead as well as multistep forecasts
PLOTS=RESIDUAL(SMOOTH)
ResidualACFPlot ResidualPACFPlot
ResidualIACFPlot
ResidualWNPlot
ForecastsOnlyPlot ForecastsPlot
PLOTS(UNPACK)
PLOTS(UNPACK)
PLOTS(UNPACK)
PLOTS(UNPACK)
Default PLOTS=FORECAST(FORCAST)
Examples: ARIMA Procedure
Example 7.1: Simulated IMA Model This example illustrates the ARIMA procedure results for a case where the true model is known. An integrated moving-average model is used for this illustration. The following DATA step generates a pseudo-random sample of 100 periods from the ARIMA(0,1,1) process ut D ut 1 C at 0:8at 1 , at iid N.0; 1/:
Example 7.1: Simulated IMA Model F 281
title1 'Simulated IMA(1,1) Series'; data a; u1 = 0.9; a1 = 0; do i = -50 to 100; a = rannor( 32565 ); u = u1 + a - .8 * a1; if i > 0 then output; a1 = a; u1 = u; end; run;
The following ARIMA procedure statements identify and estimate the model: ods graphics on; /*-- Simulated IMA Model --*/ proc arima data=a; identify var=u; run; identify var=u(1); run; estimate q=1 ; run; quit;
The graphical series correlation analysis output of the first IDENTIFY statement is shown in Output 7.1.1. The output shows the behavior of the sample autocorrelation function when the process is nonstationary. Note that in this case the estimated autocorrelations are not very high, even at small lags. Nonstationarity is reflected in a pattern of significant autocorrelations that do not decline quickly with increasing lag, not in the size of the autocorrelations.
282 F Chapter 7: The ARIMA Procedure
Output 7.1.1 Correlation Analysis from the First IDENTIFY Statement
The second IDENTIFY statement differences the series. The results of the second IDENTIFY statement are shown in Output 7.1.2. This output shows autocorrelation, inverse autocorrelation, and partial autocorrelation functions typical of MA(1) processes.
Example 7.1: Simulated IMA Model F 283
Output 7.1.2 Correlation Analysis from the Second IDENTIFY Statement
The ESTIMATE statement fits an ARIMA(0,1,1) model to the simulated data. Note that in this case the parameter estimates are reasonably close to the values used to generate the simulated data. ( D 0; O D 0:02I 1 D 0:8; O1 D 0:79I 2 D 1; O 2 D 0:82:) Moreover, the graphical analysis of the residuals shows no model inadequacies (see Output 7.1.4 and Output 7.1.5). The ESTIMATE statement results are shown in Output 7.1.3. Output 7.1.3 Output from Fitting ARIMA(0,1,1) Model Conditional Least Squares Estimation
Parameter MU MA1,1
Estimate
Standard Error
t Value
Approx Pr > |t|
Lag
0.02056 0.79142
0.01972 0.06474
1.04 12.22
0.2997 |t|
Lag
0.40194 0.55686
0.07988 0.08403
5.03 6.63
|t|
Lag
MU AR1,1 AR1,2 AR1,3
-0.12280 1.97607 -1.37499 0.34336
0.10902 0.05499 0.09967 0.05502
-1.13 35.94 -13.80 6.24
0.2609 0) if it is less than the level of significance (˛), while you can conclude that a negative autocorrelation ( 'j < 0) exists if the marginal probability based on the computed Durbin-Watson statistic is greater than 1 ˛. Wallis (1972) presented tables for bounds tests of fourth-order autocorrelation, and Vinod (1973) has given tables for a 5% significance level for orders two to four. Using the AUTOREG procedure, you can calculate the exact p-values for the general order of Durbin-Watson test statistics. Tests for the absence of autocorrelation of order p can be performed sequentially; at the j th step, test H0 W 'j D 0 given '1 D : : : D 'j 1 D 0 against 'j ¤ 0. However, the size of the sequential test is not known. The Durbin-Watson statistic is computed from the OLS residuals, while that of the autoregressive error model uses residuals that are the difference between the predicted values and the actual values.
400 F Chapter 8: The AUTOREG Procedure
When you use the Durbin-Watson test from the residuals of the autoregressive error model, you must be aware that this test is only an approximation. See “Autoregressive Error Model” on page 370 earlier in this chapter. If there are missing values, the Durbin-Watson statistic is computed using all the nonmissing values and ignoring the gaps caused by missing residuals. This does not affect the significance level of the resulting test, although the power of the test against certain alternatives may be adversely affected. Savin and White (1978) have examined the use of the Durbin-Watson statistic with missing values. The Durbin-Watson probability calculations have been enhanced to compute the p-value of the generalized Durbin-Watson statistic for large sample sizes. Previously, the Durbin-Watson probabilities were only calculated for small sample sizes. Consider the following linear regression model: Y D Xˇ C u ut C 'j ut
j
D t ;
t D 1; : : : ; N
where X is an N k data matrix, ˇ is a k 1 coefficient vector, u is a N 1 disturbance vector, and t is a sequence of independent normal error terms with mean 0 and variance 2 . The generalized Durbin-Watson statistic is written as DWj D
uO 0 A0j Aj uO uO 0 uO
where uO is a vector of OLS residuals and Aj is a .T statistic DWj can be rewritten as DWj D
Y0 MA0j Aj MY
where Q01 Q1 D IT
Y0 MY k;
D
j / T matrix. The generalized Durbin-Watson
0 .Q01 A0j Aj Q1 / 0
Q01 X D 0; and D Q01 u.
The marginal probability for the Durbin-Watson statistic is Pr.DWj < c/ D Pr.h < 0/ where h D 0 .Q01 A0j Aj Q1
cI/.
The p-value or the marginal probability for the generalized Durbin-Watson statistic is computed by numerical inversion of the characteristic function .u/ of the quadratic form h D 0 .Q01 A0j Aj Q1 cI/. The trapezoidal rule approximation to the marginal probability Pr.h < 0/ is 1 Pr.h < 0/ D 2
K X Im ..k C 12 // kD0
.k C 21 /
C EI ./ C ET .K/
where Im Œ./ is the imaginary part of the characteristic function, EI ./ and ET .K/ are integration and truncation errors, respectively. Refer to Davies (1973) for numerical inversion of the characteristic function.
Testing F 401
Ansley, Kohn, and Shively (1992) proposed a numerically efficient algorithm that requires O(N ) operations for evaluation of the characteristic function .u/. The characteristic function is denoted as ˇ ˇ .u/ D ˇI
ˇ ˇ 2i u.Q01 A0j Aj Q1 cIN k /ˇ ˇ ˇ 1=2 ˇ 0 ˇ1=2 ˇX Xˇ D jVj 1=2 ˇX0 V 1 Xˇ
1=2
p where V D .1 C 2i uc/I 2i uA0j Aj and i D 1. By applying the Cholesky decomposition to the complex matrix V, you can obtain the lower triangular matrix G that satisfies V D GG0 . Therefore, the characteristic function can be evaluated in O(N ) operations by using the following formula: .u/ D jGj
1 ˇ 0
ˇ ˇ X X ˇ
1=2 ˇˇ 0
ˇ1=2 X Xˇ
where X D G 1 X. Refer to Ansley, Kohn, and Shively (1992) for more information on evaluation of the characteristic function.
Tests for Serial Correlation with Lagged Dependent Variables
When regressors contain lagged dependent variables, the Durbin-Watson statistic (d1 ) for the firstorder autocorrelation is biased toward 2 and has reduced power. Wallis (1972) shows that the bias in the Durbin-Watson statistic (d4 ) for the fourth-order autocorrelation is smaller than the bias in d1 in the presence of a first-order lagged dependent variable. Durbin (1970) proposes two alternative statistics (Durbin h and t ) that are asymptotically equivalent. The h statistic is written as q h D O N=.1 N VO / P P where O D N O t O t 1 = N O t2 and VO is the least squares variance estimate for the coefficient t D2 t D1 of the lagged dependent variable. Durbin’s t test consists of regressing the OLS residuals O t on explanatory variables and O t 1 and testing the significance of the estimate for coefficient of O t 1 . Inder (1984) shows that the Durbin-Watson test for the absence of first-order autocorrelation is generally more powerful than the h test in finite samples. Refer to Inder (1986) and King and Wu (1991) for the Durbin-Watson test in the presence of lagged dependent variables.
Godfrey LM test
The GODFREY= option in the MODEL statement produces the Godfrey Lagrange multiplier test for serially correlated residuals for each equation (Godfrey 1978a and 1978b). r is the maximum autoregressive order, and specifies that Godfrey’s tests be computed for lags 1 through r. The default number of lags is four.
Testing for Nonlinear Dependence: Ramsey’s Reset Test Ramsey’s reset test is a misspecification test associated with the functional form of models to check whether power transforms need to be added to a model. The original linear model, henceforth called
402 F Chapter 8: The AUTOREG Procedure
the restricted model, is y t D x t ˇ C ut To test for misspecification in the functional form, the unrestricted model is y t D xt ˇ C
p X
j
j yOt C ut
j D2
where yOt is the predicted value from the linear model and p is the power of yOt in the unrestricted model equation starting from 2. The number of higher-ordered terms to be chosen depends on the discretion of the analyst. The RESET option produces test results for p D 2, 3, and 4. The reset test is an F statistic for testing H0 W j D 0, for all j D 2; : : : ; p, against H1 W j ¤ 0 for at least one j D 2; : : : ; p in the unrestricted model and is computed as follows: F.p
1;n k pC1/
D
.S SER SSEU /=.p 1/ S SEU =.n k p C 1/
where S SER is the sum of squared errors due to the restricted model, SSEU is the sum of squared errors due to the unrestricted model, n is the total number of observations, and k is the number of parameters in the original linear model. Ramsey’s test can be viewed as a linearity test that checks whether any nonlinear transformation of the specified independent variables has been omitted, but it need not help in identifying a new relevant variable other than those already specified in the current model.
Testing for Nonlinear Dependence: Heteroscedasticity Tests Portmanteau Q Test
For nonlinear time series models, the portmanteau test statistic based on squared residuals is used to test for independence of the series (McLeod and Li 1983): Q.q/ D N.N C 2/
q X r.i I O t2 / .N i/ i D1
where r.i I O t2 /
PN D
O t2 O 2 /.O t2 i t Di C1 . PN O t2 O 2 /2 t D1 .
O 2 /
N 1 X 2 O D O t N 2
t D1
This Q statistic is used to test the nonlinear effects (for example, GARCH effects) present in the residuals. The GARCH.p; q/ process can be considered as an ARMA.max.p; q/; p/ process. See the section “Predicting the Conditional Variance” on page 407 later in this chapter. Therefore, the Q statistic calculated from the squared residuals can be used to identify the order of the GARCH process.
Testing F 403
Engle’s Lagrange Multiplier Test for ARCH Disturbances
Engle (1982) proposed a Lagrange multiplier test for ARCH disturbances. The test statistic is asymptotically equivalent to the test used by Breusch and Pagan (1979). Engle’s Lagrange multiplier test for the qth order ARCH process is written LM.q/ D
N W0 Z.Z0 Z/ W0 W
1 Z0 W
where WD
O 2 1; : : :; N2 O
O 12 O 2
!0 1
and 2 3 1 O 02 O 2 qC1 6 :: :: :: :: 7 6: : : : 7 6 7 Z D 6: :: :: :: 7 4 :: : : : 5 2 2 1 O N 1 O N q The presample values ( 02 ,: : :, 2 qC1 ) have been set to 0. Note that the LM.q/ tests might have different finite-sample properties depending on the presample values, though they are asymptotically equivalent regardless of the presample values.
Lee and King’s Test for ARCH Disturbances
Engle’s Lagrange multiplier test for ARCH disturbances is a two-sided test; that is, it ignores the inequality constraints for the coefficients in ARCH models. Lee and King (1993) propose a one-sided test and prove that the test is locally most mean powerful. Let "t ; t D 1; :::; T , denote the residuals to be tested. Lee and King’s test checks H0 W ˛i D 0; i D 1; :::; q H1 W ˛i > 0; i D 1; :::; q where ˛i ; i D 1; :::; q; are in the following ARCH(q) model: p "t D ht et ; et i id.0; 1/ ht D ˛0 C
q X
˛i "2t
i
i D1
The statistic is written as "2t t DqC1 . h0
PT SD 2
Pq 2 2 tDqC1 . i D1 "t i /
PT
Pq
2 i D1 "t i P Pq 2 2. T t DqC1 iD1 "t T q
1/
2 i/
1=2
404 F Chapter 8: The AUTOREG Procedure
Wong and Li’s Test for ARCH Disturbances
Wong and Li (1995) propose a rank portmanteau statistic to minimize the effect of the existence of outliers in the test for ARCH disturbances. They first rank the squared residuals; that is, Rt D rank."2t /. Then they calculate the rank portmanteau statistic QR D
q X .ri i D1
i /2 i2
where ri , i , and i2 are defined as follows: PT ri D i D i2 D
tDi C1 .Rt
T T .T 5T 4
.T C 1/=2/.Rt T .T 2 1/=12
i
.T C 1/=2/
i 1/ .5i C 9/T 3 C 9.i 2/T 2 C 2i.5i C 8/T C 16i 2 5.T 1/2 T 2 .T C 1/
The Q, Engle’s LM, Lee and King’s, and Wong and Li’s statistics are computed from the OLS residuals, or residuals if the NLAG= option is specified, assuming that disturbances are white noise. The Q, Engle’s LM, and Wong and Li’s statistics have an approximate 2.q/ distribution under the white-noise null hypothesis, while the Lee and King’s statistic has a standard normal distribution under the white-noise null hypothesis.
Testing for Structural Change: Chow Test Consider the linear regression model y D Xˇ C u where the parameter vector ˇ contains k elements. Split the observations for this model into two subsets at the break point specified by the CHOW= option, so that y D .y0 1 ; y0 2 /0 X D .X0 1 ; X0 2 /0 u D .u0 1 ; u0 2 /0 Now consider the two linear regressions for the two subsets of the data modeled separately, y1 D X1 ˇ1 C u1 y2 D X2 ˇ2 C u2 where the number of observations from the first set is n1 and the number of observations from the second set is n2 .
Predicted Values F 405
The Chow test statistic is used to test the null hypothesis H0 W ˇ1 D ˇ2 conditional on the same error variance V .u1 / D V .u2 /. The Chow test is computed using three sums of square errors: Fchow D
.uO 0 uO uO 01 uO 1 uO 02 uO 2 /=k .uO 01 uO 1 C uO 02 uO 2 /=.n1 C n2 2k/
where uO is the regression residual vector from the full set model, uO 1 is the regression residual vector from the first set model, and uO 2 is the regression residual vector from the second set model. Under the null hypothesis, the Chow test statistic has an F distribution with k and .n1 C n2 2k/ degrees of freedom, where k is the number of elements in ˇ. Chow (1960) suggested another test statistic that tests the hypothesis that the mean of prediction errors is 0. The predictive Chow test can also be used when n2 < k. The PCHOW= option computes the predictive Chow test statistic Fpchow D
.uO 0 uO uO 01 uO 1 /=n2 uO 01 uO 1 =.n1 k/
The predictive Chow test has an F distribution with n2 and .n1
k/ degrees of freedom.
Predicted Values The AUTOREG procedure can produce two kinds of predicted values for the response series and corresponding residuals and confidence limits. The residuals in both cases are computed as the actual value minus the predicted value. In addition, when GARCH models are estimated, the AUTOREG procedure can output predictions of the conditional error variance.
Predicting the Unconditional Mean The first type of predicted value is obtained from only the structural part of the model, x0t b. These are useful in predicting values of new response time series, which are assumed to be described by the same model as the current response time series. The predicted values, residuals, and upper and lower confidence limits for the structural predictions are requested by specifying the PREDICTEDM=, RESIDUALM=, UCLM=, or LCLM= option in the OUTPUT statement. The ALPHACLM= option controls the confidence level for UCLM= and LCLM=. These confidence limits are for estimation of the mean of the dependent variable, x0t b, where xt is the column vector of independent variables at observation t. The predicted values are computed as yOt D x0t b and the upper and lower confidence limits as uO t D yOt C t˛=2 v
406 F Chapter 8: The AUTOREG Procedure
lOt D yOt
t˛=2 v
where v2 is an estimate of the variance of yOt and t˛=2 is the upper ˛/2 percentage point of the t distribution.
Prob.T > t˛=2 / D ˛=2 where T is an observation from a t distribution with q degrees of freedom. The value of ˛ can be set with the ALPHACLM= option. The degrees of freedom parameter, q, is taken to be the number of observations minus the number of free parameters in the regression and autoregression parts of the model. For the YW estimation method, the value of v is calculated as q v D s 2 x0t .X0 V 1 X/ 1 xt where s 2 is the error sum of squares divided by q. For the ULS and ML methods, it is calculated as q v D s 2 x0t Wxt where W is the kk submatrix of .J0 J/ 1 that corresponds to the regression parameters. For details, see the section “Computational Methods” on page 372 earlier in this chapter.
Predicting Future Series Realizations The other predicted values use both the structural part of the model and the predicted values of the error process. These conditional mean values are useful in predicting future values of the current response time series. The predicted values, residuals, and upper and lower confidence limits for future observations conditional on past values are requested by the PREDICTED=, RESIDUAL=, UCL=, or LCL= option in the OUTPUT statement. The ALPHACLI= option controls the confidence level for UCL= and LCL=. These confidence limits are for the predicted value, yQt D x0t b C t jt
1
where xt is the vector of independent variables if all independent variables at time t are nonmissing, and t jt 1 is the minimum variance linear predictor of the error term, which is defined in the following recursive way given the autoregressive model, AR(m) model, for t : 8 Pm < i D1 'O i s ijt s > t or observation s is missing ys x0s b 0 < s t and observation s is nonmissing sjt D : 0 s0 where 'Oi ; i D 1; : : :; m, are the estimated AR parameters. Observation s is considered to be missing if the dependent variable or at least one independent variable is missing. If some of the independent variables at time t are missing, the predicted yQt is also missing. With the same definition of sjt , the prediction method can be easily extended to the multistep forecast of yQt Cd ; d > 0: yQt Cd D x0t Cd b C tCd jt
1
The prediction method is implemented through the Kalman filter.
Predicted Values F 407
If yQt is not missing, the upper and lower confidence limits are computed as uQ t D yQt C t˛=2 v lQt D yQt
t˛=2 v
where v, in this case, is computed as q v D z0t Vˇ zt C s 2 r where Vˇ is the variance-covariance matrix of the estimation of regression parameter ˇ; zt is defined as zt D xt C
m X
'Oi xt
ijt 1
i D1
and xsjt is defined in a similar way as sjt : 8 Pm < i D1 'O i xs ijt s > t or observation s is missing xs 0 < s t and observation s is nonmissing xsjt D : 0 s0 The value s 2 r is the estimate of the conditional prediction error variance. At the start of the series, and after missing values, r is generally greater than 1. See the section “Predicting the Conditional Variance” on page 407 for the computational details of r. The plot of residuals and confidence limits in Example 8.4 illustrates this behavior. Except to adjust the degrees of freedom for the error sum of squares, the preceding formulas do not account for the fact that the autoregressive parameters are estimated. In particular, the confidence limits are likely to be somewhat too narrow. In large samples, this is probably not an important effect, but it might be appreciable in small samples. Refer to Harvey (1981) for some discussion of this problem for AR(1) models. At the beginning of the series (the first m observations, where m is the value of the NLAG= option) and after missing values, these residuals do not match the residuals obtained by using OLS on the transformed variables. This is because, in these cases, the predicted noise values must be based on less than a complete set of past noise values and, thus, have larger variance. The GLS transformation for these observations includes a scale factor in addition to a linear combination of past values. Put another way, the L 1 matrix defined in the section “Computational Methods” on page 372 has the value 1 along the diagonal, except for the first m observations and after missing values.
Predicting the Conditional Variance The GARCH process can be written t2 D ! C
n X i D1
.˛i C i /t2
p X i j D1
j t
j
C t
408 F Chapter 8: The AUTOREG Procedure
where t D t2 ht and n D max.p; q/. This representation shows that the squared residual t2 follows an ARMA.n; p/ process. Then for any d > 0, the conditional expectations are as follows: E.t2Cd j‰t /
n X
D!C
.˛i C
i /E.t2Cd i j‰t /
i D1
d X1
j E.t Cd
j j‰t /
j D1
The d-step-ahead prediction error, t Cd = yt Cd V.t Cd j‰t / D
p X
gj2 t2Cd
yt Cd jt , has the conditional variance
j jt
j D0
where t2Cd
j jt
D E.t2Cd
j j‰t /
Coefficients in the conditional d-step prediction error variance are calculated recursively using the formula gj D
'1 gj
1
:::
'm gj
m
where g0 D 1 and gj D 0 if j < 0; '1 , : : :, 'm are autoregressive parameters. Since the parameters are not known, the conditional variance is computed using the estimated autoregressive parameters. The d-step-ahead prediction error variance is simplified when there are no autoregressive terms: V.t Cd j‰t / D t2Cd jt Therefore, the one-step-ahead prediction error variance is equivalent to the conditional error variance defined in the GARCH process: ht D E.t2 j‰t
1/
D t2jt
1
The multistep forecast of conditional error variance of the EGARCH, QGARCH, TGARCH, PGARCH, and GARCH-M models cannot be calculated using the preceding formula for the GARCH model. The following formulas are recursively implemented to obtain the multistep forecast of conditional error variance of these models: for the EGARCH(p, q) model: ln.t2Cd jt / D ! C
q X
˛i g.zt Cd
i Dd
where g.zt / D zt C jzt j p zt D t = ht
Ejzt j
i/ C
d X1 j D1
j ln.t2Cd
j jt / C
p X j Dd
j ln.ht Cd
j/
Predicted Values F 409
for the QGARCH(p, q) model: t2Cd jt
D! C
d X1
˛i .t2Cd ijt
2 i /
C
C
i D1
C
d X1
q X
˛i .t Cd
i
2 i/
i Dd 2
j tCd
j jt
p X
C
j D1
j ht Cd
j
j Dd
for the TGARCH(p, q) model: t2Cd jt
D! C
d X1
2 i =2/t Cd i jt
.˛i C
C
i D1
C
d X1
q X
.˛i C 1t Cd
i ChiSq) 8. the parameter estimates for the structural model (Estimate), a standard error estimate (Standard Error), the ratio of estimate to standard error (t Value), and an approximation to the significance probability for the parameter being 0 (Approx Pr > |t|) 9. If the NLAG= option is specified with METHOD=ULS or METHOD=ML, the regression parameter estimates are printed again, assuming that the autoregressive parameter estimates are known. In this case, the Standard Error and related statistics for the regression estimates will, in general, be different from the case when they are estimated. Note that from a standpoint of estimation, Yule-Walker and iterated Yule-Walker methods (NLAG= with METHOD=YW, ITYW) generate only one table, assuming AR parameters are given. 10. If you specify the NORMAL option, the Bera-Jarque normality test statistics are printed. If you specify the LAGDEP option, Durbin’s h or Durbin’s t is printed.
ODS Table Names F 413
ODS Table Names PROC AUTOREG assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the Table 8.2. Table 8.2
ODS Tables Produced in PROC AUTOREG
ODS Table Name
Description
ODS Tables Created by the MODEL Statement ClassLevels Class Levels FitSummary Summary of regression SummaryDepVarCen Summary of regression (centered dependent var) SummaryNoIntercept Summary of regression (no intercept) YWIterSSE Yule-Walker iteration sum of squared error PreMSE Preliminary MSE Dependent Dependent variable DependenceEquations Linear dependence equation ARCHTest Tests for ARCH disturbances based on OLS residuals ARCHTestAR Tests for ARCH disturbances based on residuals BDSTest BDS test for independence RunsTest Runs test for independence TurningPointTest Turning Point test for independence VNRRankTest Rank version of von Neumann ratio test for independence ChowTest Chow test and predictive Chow test Godfrey PhilPerron
Godfrey’s serial correlation test Phillips-Perron unit root test
PhilOul
Phillips-Ouliaris cointegration test
ADF
Augmented Dickey-Fuller unit root test
EngGran
Engle-Granger cointegration test
ERS
ERS unit root test
Option default default CENTER NOINT METHOD=ITYW NLAG= default ARCHTEST= ARCHTEST= (with NLAG=) BDS RUNS TP VNRRANK CHOW= PCHOW= GODFREY STATIONARITY= (PHILIPS) (no regressor) STATIONARITY= (PHILIPS) (has regressor) STATIONARITY= (ADF) (no regressor) STATIONARITY= (ADF) (has regressor) STATIONARITY= (ERS)
414 F Chapter 8: The AUTOREG Procedure
Table 8.2
continued
ODS Table Name
Description
NgPerron
Ng-Perron Unit root tests
KPSS ResetTest ARParameterEstimates CorrGraph BackStep ExpAutocorr IterHistory ParameterEstimates ParameterEstimatesGivenAR
PartialAutoCorr CovB CorrB CholeskyFactor Coefficients GammaInverse ConvergenceStatus MiscStat DWTest
Option
STATIONARITY= (NP= ) Kwiatkowski, Phillips, Schmidt, and STATIONARITY= Shin test (KPSS) Ramsey’s RESET test RESET Estimates of autoregressive parame- NLAG= ters estimates of autocorrelations NLAG= Backward elimination of autoregres- BACKSTEP sive terms Expected autocorrelations NLAG= Iteration history ITPRINT Parameter estimates default Parameter estimates assuming AR pa- NLAG=, METHOD= rameters are given ULS | ML Partial autocorrelation PARTIAL Covariance of parameter estimates COVB Correlation of parameter estimates CORRB Cholesky root of gamma ALL Coefficients for first NLAG observa- COEF tions Gamma inverse GINV Convergence status table default Durbin t or Durbin h, Bera-Jarque LAGDEP=; normality test NORMAL Durbin-Watson statistics DW=
ODS Tables Created by the RESTRICT Statement Restrict Restriction table ODS Tables Created by the TEST Statement FTest F test WaldTest LMTest
Wald test LM test
LRTest
LR test
default default, TYPE=ALL TYPE=WALD|ALL TYPE=LM|ALL (only supported with GARCH= option) TYPE=LR|ALL (only supported with GARCH= option)
ODS Graphics F 415
ODS Graphics This section describes the use of ODS for creating graphics with the AUTOREG procedure. To request these graphs, you must specify the ODS GRAPHICS statement. By default, only the residual, predicted versus actual, and autocorrelation of residuals plots are produced. If, in addition to the ODS GRAPHICS statement, you also specify the ALL option in either the PROC AUTOREG statement or MODEL statement, all plots are created. For HETERO, GARCH, and AR models studentized residuals are replaced by standardized residuals. For the autoregressive models, the conditional variance of the residuals is computed as described in the section “Predicting Future Series Realizations” on page 406. For the GA RCH and HETERO models, residuals are assumed to have ht conditional variance invoked by the HT= option of the OUTPUT statement. For all these cases, the Cook’s D plot is not produced.
ODS Graph Names PROC AUTOREG assigns a name to each graph it creates using ODS. You can use these names to reference the graphs when using ODS. The names are listed in Table 8.3. Table 8.3
ODS Graphics Produced by PROC AUTOREG
ODS Graph Name
Plot Description
Option
ACFPlot FitPlot CooksD IACFPlot QQPlot PACFPlot ResidualHistogram ResidualPlot StudentResidualPlot StandardResidualPlot WhiteNoiseLogProbPlot
Autocorrelation of residuals Predicted versus actual plot Cook’s D plot Inverse autocorrelation of residuals Q-Q plot of residuals Partial autocorrelation of residuals Histogram of the residuals Residual plot Studentized residual plot Standardized residual plot Tests for white noise residuals
ACF Default ALL (no NLAG=) ALL ALL ALL ALL Default ALL (no NLAG=/HETERO=/GARCH=) ALL ALL
416 F Chapter 8: The AUTOREG Procedure
Examples: AUTOREG Procedure
Example 8.1: Analysis of Real Output Series In this example, the annual real output series is analyzed over the period 1901 to 1983 (Balke and Gordon 1986, pp. 581–583). With the following DATA step, the original data are transformed using the natural logarithm, and the differenced series DY is created for further analysis. The log of real output is plotted in Output 8.1.1. title 'Analysis of Real GNP'; data gnp; date = intnx( 'year', '01jan1901'd, _n_-1 ); format date year4.; input x @@; y = log(x); dy = dif(y); t = _n_; label y = 'Real GNP' dy = 'First Difference of Y' t = 'Time Trend'; datalines; ... more lines ...
proc sgplot data=gnp noautolegend; scatter x=date y=y; xaxis grid values=('01jan1901'd '01jan1911'd '01jan1921'd '01jan1931'd '01jan1941'd '01jan1951'd '01jan1961'd '01jan1971'd '01jan1981'd '01jan1991'd); run;
Example 8.1: Analysis of Real Output Series F 417
Output 8.1.1 Real Output Series: 1901 – 1983
The (linear) trend-stationary process is estimated using the following form: yt D ˇ0 C ˇ1 t C t where t D t
'1 t
1
'2 t
2
t IN.0; / The preceding trend-stationary model assumes that uncertainty over future horizons is bounded since the error term, t , has a finite variance. The maximum likelihood AR estimates from the statements that follow are shown in Output 8.1.2: proc autoreg data=gnp; model y = t / nlag=2 method=ml; run;
418 F Chapter 8: The AUTOREG Procedure
Output 8.1.2 Estimating the Linear Trend Model Analysis of Real GNP The AUTOREG Procedure Maximum Likelihood Estimates SSE MSE SBC MAE MAPE Durbin-Watson
0.23954331 0.00303 -230.39355 0.04016596 0.69458594 1.9935
DFE Root MSE AIC AICC HQC Regress R-Square Total R-Square
79 0.05507 -240.06891 -239.55609 -236.18189 0.8645 0.9947
Parameter Estimates
Variable Intercept t AR1 AR2
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1 1 1
4.8206 0.0302 -1.2041 0.3748
0.0661 0.001346 0.1040 0.1039
72.88 22.45 -11.58 3.61
|t|
1 1
0.0293 -0.2967
0.009093 0.1067
3.22 -2.78
0.0018 0.0067
Autoregressive parameters assumed given
Variable
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
Intercept
1
0.0293
0.009093
3.22
0.0018
Example 8.2: Comparing Estimates and Models In this example, the Grunfeld series are estimated using different estimation methods. Refer to Maddala (1977) for details of the Grunfeld investment data set. For comparison, the Yule-Walker method, ULS method, and maximum likelihood method estimates are shown. With the DWPROB option, the p-value of the Durbin-Watson statistic is printed. The Durbin-Watson test indicates the positive autocorrelation of the regression residuals. The DATA and PROC steps follow:
Example 8.2: Comparing Estimates and Models F 421
title 'Grunfeld''s Investment Models Fit with Autoregressive Errors'; data grunfeld; input year gei gef gec; label gei = 'Gross investment GE' gec = 'Lagged Capital Stock GE' gef = 'Lagged Value of GE shares'; datalines; ... more lines ...
proc autoreg model gei model gei model gei run;
data=grunfeld; = gef gec / nlag=1 dwprob; = gef gec / nlag=1 method=uls; = gef gec / nlag=1 method=ml;
The printed output produced by each of the MODEL statements is shown in Output 8.2.1 through Output 8.2.4.
422 F Chapter 8: The AUTOREG Procedure
Output 8.2.1 OLS Analysis of Residuals Grunfeld's Investment Models Fit with Autoregressive Errors The AUTOREG Procedure Dependent Variable
gei Gross investment GE
Ordinary Least Squares Estimates SSE MSE SBC MAE MAPE Durbin-Watson
13216.5878 777.44634 195.614652 19.9433255 23.2047973 1.0721
DFE Root MSE AIC AICC HQC Regress R-Square Total R-Square
17 27.88272 192.627455 194.127455 193.210587 0.7053 0.7053
Parameter Estimates
Variable
DF
Estimate
Intercept gef gec
1 1 1
-9.9563 0.0266 0.1517
Standard Approx Error t Value Pr > |t| Variable Label 31.3742 0.0156 0.0257
-0.32 1.71 5.90
0.7548 0.1063 Lagged Value of GE shares |t| Variable Label 33.2511 0.0158 0.0383
-0.55 2.10 3.63
0.5911 0.0523 Lagged Value of GE shares 0.0022 Lagged Capital Stock GE
Output 8.2.3 Regression Results Using Unconditional Least Squares Method Estimates of Autoregressive Parameters
Lag
Coefficient
Standard Error
t Value
1
-0.460867
0.221867
-2.08
Algorithm converged. Unconditional Least Squares Estimates SSE MSE SBC MAE MAPE Durbin-Watson
10220.8455 638.80284 193.756692 18.1317764 21.149176 1.3523
DFE Root MSE AIC AICC HQC Regress R-Square Total R-Square
16 25.27455 189.773763 192.44043 190.551273 0.5511 0.7721
Parameter Estimates
Variable Intercept gef gec AR1
DF
Estimate
1 1 1 1
-18.6582 0.0339 0.1369 -0.4996
Standard Approx Error t Value Pr > |t| Variable Label 34.8101 0.0179 0.0449 0.2592
-0.54 1.89 3.05 -1.93
0.5993 0.0769 Lagged Value of GE shares 0.0076 Lagged Capital Stock GE 0.0718
424 F Chapter 8: The AUTOREG Procedure
Output 8.2.3 continued Autoregressive parameters assumed given
Variable
DF
Estimate
Intercept gef gec
1 1 1
-18.6582 0.0339 0.1369
Standard Approx Error t Value Pr > |t| Variable Label 33.7567 0.0159 0.0404
-0.55 2.13 3.39
0.5881 0.0486 Lagged Value of GE shares 0.0037 Lagged Capital Stock GE
Output 8.2.4 Regression Results Using Maximum Likelihood Method Estimates of Autoregressive Parameters
Lag
Coefficient
Standard Error
t Value
1
-0.460867
0.221867
-2.08
Algorithm converged. Maximum Likelihood Estimates SSE MSE SBC MAE MAPE Durbin-Watson
10229.2303 639.32689 193.738877 18.0892426 21.0978407 1.3385
DFE Root MSE AIC AICC HQC Regress R-Square Total R-Square
16 25.28491 189.755947 192.422614 190.533457 0.5656 0.7719
Parameter Estimates
Variable Intercept gef gec AR1
DF
Estimate
1 1 1 1
-18.3751 0.0334 0.1385 -0.4728
Standard Approx Error t Value Pr > |t| Variable Label 34.5941 0.0179 0.0428 0.2582
-0.53 1.87 3.23 -1.83
0.6026 0.0799 Lagged Value of GE shares 0.0052 Lagged Capital Stock GE 0.0858
Autoregressive parameters assumed given
Variable
DF
Estimate
Intercept gef gec
1 1 1
-18.3751 0.0334 0.1385
Standard Approx Error t Value Pr > |t| Variable Label 33.3931 0.0158 0.0389
-0.55 2.11 3.56
0.5897 0.0512 Lagged Value of GE shares 0.0026 Lagged Capital Stock GE
Example 8.3: Lack-of-Fit Study F 425
Example 8.3: Lack-of-Fit Study Many time series exhibit high positive autocorrelation, having the smooth appearance of a random walk. This behavior can be explained by the partial adjustment and adaptive expectation hypotheses. Short-term forecasting applications often use autoregressive models because these models absorb the behavior of this kind of data. In the case of a first-order AR process where the autoregressive parameter is exactly 1 (a random walk ), the best prediction of the future is the immediate past. PROC AUTOREG can often greatly improve the fit of models, not only by adding additional parameters but also by capturing the random walk tendencies. Thus, PROC AUTOREG can be expected to provide good short-term forecast predictions. However, good forecasts do not necessarily mean that your structural model contributes anything worthwhile to the fit. In the following example, random noise is fit to part of a sine wave. Notice that the structural model does not fit at all, but the autoregressive process does quite well and is very nearly a first difference (AR(1) = :976). The DATA step, PROC AUTOREG step, and PROC SGPLOT step follow: title1 'Lack of Fit Study'; title2 'Fitting White Noise Plus Autoregressive Errors to a Sine Wave'; data a; pi=3.14159; do time = 1 to 75; if time > 75 then y = .; else y = sin( pi * ( time / 50 ) ); x = ranuni( 1234567 ); output; end; run; proc autoreg data=a plots; model y = x / nlag=1; output out=b p=pred pm=xbeta; run; proc sgplot data=b; scatter y=y x=time / markerattrs=(color=black); series y=pred x=time / lineattrs=(color=blue); series y=xbeta x=time / lineattrs=(color=red); run;
The printed output produced by PROC AUTOREG is shown in Output 8.3.1 and Output 8.3.2. Plots of observed and predicted values are shown in Output 8.3.3 and Output 8.3.4. Note: the plot Output 8.3.3 can be viewed in the Autoreg.Model.FitDiagnosticPlots category by selecting ViewIResults.
426 F Chapter 8: The AUTOREG Procedure
Output 8.3.1 Results of OLS Analysis: No Autoregressive Model Fit Lack of Fit Study Fitting White Noise Plus Autoregressive Errors to a Sine Wave The AUTOREG Procedure Dependent Variable
y
Ordinary Least Squares Estimates SSE MSE SBC MAE MAPE Durbin-Watson
34.8061005 0.47680 163.898598 0.59112447 117894.045 0.0057
DFE Root MSE AIC AICC HQC Regress R-Square Total R-Square
73 0.69050 159.263622 159.430289 161.114317 0.0008 0.0008
Parameter Estimates
Variable
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1
0.2383 -0.0665
0.1584 0.2771
1.50 -0.24
0.1367 0.8109
Intercept x
Estimates of Autocorrelations Lag
Covariance
Correlation
0 1
0.4641 0.4531
1.000000 0.976386
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 | |
|********************| |********************|
Preliminary MSE
0.0217
Output 8.3.2 Regression Results with AR(1) Error Correction Estimates of Autoregressive Parameters
Lag
Coefficient
Standard Error
t Value
1
-0.976386
0.025460
-38.35
Yule-Walker Estimates SSE MSE SBC MAE MAPE Durbin-Watson
0.18304264 0.00254 -222.30643 0.04551667 29145.3526 0.0942
DFE Root MSE AIC AICC HQC Regress R-Square Total R-Square
72 0.05042 -229.2589 -228.92087 -226.48285 0.0001 0.9947
Example 8.3: Lack-of-Fit Study F 427
Output 8.3.2 continued Parameter Estimates
Variable Intercept x
Output 8.3.3 Diagnostics Plots
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
1 1
-0.1473 -0.001219
0.1702 0.0141
-0.87 -0.09
0.3898 0.9315
428 F Chapter 8: The AUTOREG Procedure
Output 8.3.4 Plot of Autoregressive Prediction
Example 8.4: Missing Values F 429
Example 8.4: Missing Values In this example, a pure autoregressive error model with no regressors is used to generate 50 values of a time series. Approximately 15% of the values are randomly chosen and set to missing. The following statements generate the data: title 'Simulated Time Series with Roots:'; title2 ' (X-1.25)(X**4-1.25)'; title3 'With 15% Missing Values'; data ar; do i=1 to 550; e = rannor(12345); n = sum( e, .8*n1, .8*n4, -.64*n5 ); /* ar process */ y = n; if ranuni(12345) > .85 then y = .; /* 15% missing */ n5=n4; n4=n3; n3=n2; n2=n1; n1=n; /* set lags */ if i>500 then output; end; run;
The model is estimated using maximum likelihood, and the residuals are plotted with 99% confidence limits. The PARTIAL option prints the partial autocorrelations. The following statements fit the model: proc autoreg data=ar partial; model y = / nlag=(1 4 5) method=ml; output out=a predicted=p residual=r ucl=u lcl=l alphacli=.01; run;
The printed output produced by the AUTOREG procedure is shown in Output 8.4.1 and Output 8.4.2. Note: the plot Output 8.4.2 can be viewed in the Autoreg.Model.FitDiagnosticPlots category by selecting ViewIResults.
430 F Chapter 8: The AUTOREG Procedure
Output 8.4.1 Autocorrelation-Corrected Regression Results Simulated Time Series with Roots: (X-1.25)(X**4-1.25) With 15% Missing Values The AUTOREG Procedure Dependent Variable
y
Ordinary Least Squares Estimates SSE MSE SBC MAE MAPE Durbin-Watson
182.972379 4.57431 181.39282 1.80469152 270.104379 1.3962
DFE Root MSE AIC AICC HQC Regress R-Square Total R-Square
40 2.13876 179.679248 179.781813 180.303237 0.0000 0.0000
Parameter Estimates
Variable
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
Intercept
1
-2.2387
0.3340
-6.70
|t|
1 1 1 1
-2.2370 -0.6201 -0.7237 0.6550
0.5239 0.1129 0.0914 0.1202
-4.27 -5.49 -7.92 5.45
0.0001 |t| Variable Label 0.2359 0.0439 0.0122 0.007933 0.001584
1.31 20.38 3.89 -3.00 -3.56
0.1963 |t| Variable Label 0.4880 0.0908 0.0411 0.0159 0.001834 0.0686
4.94 4.50 3.67 -6.92 -3.46 -12.89
Each item is a constant, a parameter name, or a list of parameter names. Each operator is , =. Parameter names are as shown in the ESTIMATE column of the “Parameter Estimates” table or can be seen in the OUTEST= data set. You can use both the BOUNDS statement and the RESTRICT statement to impose boundary constraints; however, the BOUNDS statement provides a simpler syntax for specifying these kinds of constraints. See also the section “RESTRICT Statement” on page 529. The following BOUNDS statement constrains the estimates of the parameter for z to be negative, the parameters for x1 through x10 to be between zero and one, and the parameter for x1 in the zero-inflation model to be less than one: bounds z < 0, 0 < x1-x10 < 1, Inf_x1 < 1;
BY Statement BY variables ;
A BY statement can be used with PROC COUNTREG to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the input data set should be sorted in the order of the BY variables.
526 F Chapter 10: The COUNTREG Procedure
CLASS Statement CLASS variables ;
The CLASS statement names the classification variables that are used to group (classify) data in the analysis. Classification variables can be either character or numeric. Class levels are determined from the formatted values of the CLASS variables. Thus, you can use formats to group values into levels. See the discussion of the FORMAT procedure in the SAS Language Reference: Dictionary for details. The CLASS statement must precede the MODEL statement.
FREQ Statement FREQ variable ;
The FREQ statement specifies a variable whose values represent the frequency of occurrence of each observation. PROC COUNTREG treats each observation as if it appears n times, where n is the value of the FREQ variable for the observation. If the frequency value is not an integer, it is truncated to an integer; if it is less than 1 or missing, the observation is not used in the model fitting. When the FREQ statement is not specified, each observation is assigned a frequency of 1. If you specify more than one FREQ statement, then the first statement is used.
INIT Statement INIT initvalue1 < , initvalue2 . . . > ;
The INIT statement sets initial values for parameters in the optimization. Each initvalue is written as a parameter or parameter list, followed by an optional equal sign (=), followed by a number: parameter < = > number
For continuous regressors, the names of the parameters are the same as the corresponding variables. For a regressor that is a CLASS variable, the parameter name combines the corresponding CLASS variable name with the variable level. For interaction and nested regressors, the parameter names combine the names of each regressor. The names of the parameters can be seen in the OUTEST= data set. By default, initial values are determined by OLS regression. Initial values can be displayed with the ITPRINT option in the PROC statement.
MODEL Statement F 527
MODEL Statement MODEL dependent = ;
The MODEL statement specifies the dependent variable and independent covariates (regressors) for the regression model. If you specify no regressors, PROC COUNTREG fits a model that contains only an intercept. The dependent count variable should take on only nonnegative integer values in the input data set. PROC COUNTREG rounds any positive noninteger count values to the nearest integer. PROC COUNTREG ignores any observations with a negative count. Only one MODEL statement can be specified. The following options can be used in the MODEL statement after a slash (/). DIST=value
specifies a type of model to be analyzed. If you specify this option in both the MODEL statement and the PROC COUNTREG statement, then only the value in the MODEL statement is used. The following model types are supported: POISSON | P
Poisson regression model
NEGBIN(P=1)
negative binomial regression model with a linear variance function
NEGBIN(P=2) | NEGBIN negative binomial regression model with a quadratic variance function ZIPOISSON | ZIP
zero-inflated Poisson regression. The ZEROMODEL statement must be specified when this model type is specified.
ZINEGBIN | ZINB zero-inflated negative binomial regression. The ZEROMODEL statement must be specified when this model type is specified. NOINT
suppresses the intercept parameter. OFFSET=variable
specifies a variable in the input data set to be used as an offset variable. The offset variable appears as a covariate in the model with its parameter restricted to 1. The offset variable cannot be the response variable, the zero-inflation offset variable (if any), or one of the explanatory variables. The Model Fit Summary gives the name of the data set variable used as the offset variable; it is labeled as “Offset.”
Printing Options CORRB
prints the correlation matrix of the parameter estimates. The CORRB option can also be specified in the PROC COUNTREG statement. COVB
prints the covariance matrix of the parameter estimates. The COVB can also be specified in the PROC COUNTREG statement.
528 F Chapter 10: The COUNTREG Procedure
ITPRINT
prints the objective function and parameter estimates at each iteration. The objective function is the negative log-likelihood function. The ITPRINT option can also be specified in the PROC COUNTREG statement. PRINTALL
requests all printing options. The PRINTALL option can also be specified in the PROC COUNTREG statement.
NLOPTIONS Statement NLOPTIONS < options > ;
The NLOPTIONS statement provides the options to control the nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks. For a list of all the options of the NLOPTIONS statement, see Chapter 6, “Nonlinear Optimization Methods.”
OUTPUT Statement OUTPUT < OUT=SAS-data-set > < output-options > ;
The OUTPUT statement creates a new SAS data set that contains all the variables in the input data set and, optionally, the estimates of x0i ˇ, the expected value of the response variable, and the probability that the response variable will take on the current value or other values that you specify. In a zero-inflated model, you can additionally request that the output data set contain the estimates of z0i and the probability that the response is zero as a result of the zero-generating process. Except for the probability of the current value, these statistics can be computed for all observations in which the regressors are not missing, even if the response is missing. By adding observations with missing response values to the input data set, you can compute these statistics for new observations or for settings of the regressors that are not present in the data without affecting the model fit. You can specify only one OUTPUT statement. You can specify the following OUTPUT statement options: OUT=SAS-data-set
names the output data set. XBETA=name
names the variable that contains estimates of x0i ˇ. PRED=name
names the variable that contains the predicted value of the response variable. PROB=name
names the variable that contains the probability of the response variable taking the current value, Pr(Y D yi ).
RESTRICT Statement F 529
PROBCOUNT(value1 )
outputs the probability of the response variable taking particular values. Each value should be a nonnegative integer. Nonintegers are rounded to the nearest integer. value can also be a list of the form X TO Y BY Z. For example, PROBCOUNT(0 1 2 TO 10 BY 2 15) requests predicted probabilities for counts 0, 1, 2, 4, 5, 6, 8, 10, and 15. ZGAMMA=name
names the variable that contains estimates of z0i . PROBZERO=name
names the variable that contains the value of 'i , the probability that the response variable will take on the value of zero as a result of the zero-generating process. It is written to the output file only if the model is zero-inflated. Note that this is not the overall probability of a zero response. That is provided by the PROBCOUNT(0) option.
RESTRICT Statement RESTRICT restriction1 < , restriction2 . . . > ;
The RESTRICT statement imposes linear restrictions on the parameter estimates. You can specify any number of RESTRICT statements. Each restriction is written as an expression, followed by an equality operator (=) or an inequality operator (, =), followed by a second expression: expression operator expression
The operator can be =, , =. Restriction expressions can be composed of parameter names, constants, and the operators times (), plus (C), and minus ( ). The restriction expressions must be a linear function of the parameters. For continuous regressors, the names of the parameters are the same as the corresponding variables. For a regressor that is a CLASS variable, the parameter name combines the corresponding CLASS variable name with the variable level. For interaction and nested regressors, the parameter names combine the names of each regressor. The names of the parameters can be seen in the OUTEST= data set. Lagrange multipliers are reported in the “Parameter Estimates” table for all the active linear constraints. They are identified with the names Restrict1, Restrict2, and so on. The probabilities of these Lagrange multipliers are computed using a beta distribution (LaMotte 1994). Nonactive (nonbinding) restrictions have no effect on the estimation results and are not noted in the output. The following RESTRICT statement constrains the negative binomial dispersion parameter ˛ to 1, which restricts the conditional variance to be C 2 : restrict _Alpha = 1;
530 F Chapter 10: The COUNTREG Procedure
WEIGHT Statement WEIGHT variable < / option > ;
The WEIGHT statement specifies a variable to supply weighting values to use for each observation in estimating parameters. The log likelihood for each observation is multiplied by the corresponding weight variable value. If the weight of an observation is nonpositive, that observation is not used in the estimation. The following option can be added to the WEIGHT statement after a slash (/). NONORMALIZE
does not normalize the weights. By default, the weights are normalized so that they add up to the actual sample size. Weights wi are normalized by multiplying them by Pn n w , where n is iD1 i the sample size. If the weights are required to be used as is, then specify the NONORMALIZE option.
ZEROMODEL Statement ZEROMODEL dependent variable zero-inflated regressors / options ;
The ZEROMODEL statement is required if either ZIP or ZINB is specified in the DIST= option in the MODEL statement. If ZIP or ZINB is specified, then the ZEROMODEL statement must follow immediately after the MODEL statement. The dependent variable in the ZEROMODEL statement must be the same as the dependent variable in the MODEL statement. The zero-inflated (ZI) regressors appear in the equation that determines the probability ('i ) of a zero count. Each of these q variables has a parameter to be estimated in the regression. For example, let z0i be the i th observation’s 1 .q C 1/ vector of values of the q ZI explanatory variables (w0 is set to 1 for the intercept term). Then 'i is a function of z0i , where is the .q C 1/ 1 vector of parameters to be estimated. (The ZI intercept is 0 ; the coefficients for the q ZI covariates are 1 ; : : : ; q .) If this option is omitted, then only the intercept term 0 is estimated. The “Parameter Estimates” table in the displayed output gives the estimates for the ZI intercept and ZI explanatory variables; they are labeled with the prefix “Inf_”. For example, the ZI intercept is labeled “Inf_intercept”. If you specify Age (a variable in your data set) as a ZI explanatory variable, then the “Parameter Estimates” table labels the corresponding parameter estimate “Inf_Age”. The following options can be specified in the ZEROMODEL statement following a slash (/): LINK=value
specifies the distribution function used to compute probability of zeros. The following distribution functions are supported: LOGISTIC
specifies the logistic distribution.
NORMAL
specifies the standard normal distribution.
Details: COUNTREG Procedure F 531
If this option is omitted, then the default ZI link function is logistic. OFFSET=variable
specifies a variable in the input data set to be used as a zero-inflated (ZI) offset variable. The ZI offset variable is included as a term, with coefficient restricted to 1, in the equation that determines the probability ('i ) of a zero count. The ZI offset variable cannot be the response variable, the offset variable (if any), or one of the explanatory variables. The name of the data set variable used as the ZI offset variable is displayed in the “Model Fit Summary” output, where it is labeled as “Inf_offset”.
Details: COUNTREG Procedure
Specification of Regressors Each term in a model, called regressor, is a variable or combination of variables. Regressors are specified with a special notation that uses variable names and operators. There are two kinds of variables: classification (CLASS) variables and continuous variables. There are two primary operators: crossing and nesting. A third operator, the bar operator, is used to simplify effect specification. In the SAS System, classification ( CLASS) variables are declared in the CLASS statement. (They can also be called categorical, qualitative, discrete, or nominal variables.) Classification variables can be either numeric or character. The values of a classification variable are called levels. For example, the classification variable Sex has the levels “male” and “female.” In a model, an independent variable that is not declared in the CLASS statement is assumed to be continuous. Continuous variables, which must be numeric, are used for response variables and covariates. For example, the heights and weights of subjects are continuous variables.
Types of Regressors Seven different types of regressors are used in the COUNTREG procedure. In the following list, assume that A, B, C, D, and E are CLASS variables and that X1, X2, and Y are continuous variables: Regressors are specified by writing continuous variables by themselves: X1 X2. Polynomial regressors are specified by joining (crossing) two or more continuous variables with asterisks: X1*X1 X1*X2. Dummy regressors are specified by writing CLASS variables by themselves: A B C. Dummy interactions are specified by joining classification variables with asterisks: A*B B*C A*B*C.
532 F Chapter 10: The COUNTREG Procedure
Nested regressors are specified by following a dummy variable or dummy interaction with a classification variable or list of classification variables enclosed in parentheses. The dummy variable or dummy interaction is nested within the regressor listed in parentheses: B(A) C(B*A) D*E(C*B*A). In this example, B(A) is read “B nested within A.” Continuous-by-class regressors are written by joining continuous variables and classification variables with asterisks: X1*A. Continuous-nesting-class regressors consist of continuous variables followed by a classification variable interaction enclosed in parentheses: X1(A) X1*X2(A*B). One example of the general form of an effect that involves several variables is X1*X2*A*B*C(D*E)
This example contains interacting continuous terms with classification terms that are nested within more than one classification variable. The continuous list comes first, followed by the dummy list, followed by the nesting list in parentheses. Note that asterisks can appear within the nested list but not immediately before the left parenthesis. The MODEL statement and several other statements use these effects. Some examples of MODEL statements that use various kinds of effects are shown in the following table, where a, b, and c represent classification variables and y, y1, y2, x, and z represent continuous variables. Specification model y=x;
Type of Model Simple regression
model y=x z;
Multiple regression
model y=x x*x;
Polynomial regression
model y=a;
Regression with one classification variable
model y=a b c;
Regression with multiple classification variables
model y=a b a*b;
Regression with classification variables and their interactions
model y=a b(a) c(b a);
Regression with classification variables and their interactions
model y=a x;
Regression with both countibuous and classification variables
model y=a x(a);
Reparate-slopes regression
model y=a x x*a;
Homogeneity-of-slopes regression
The Bar Operator You can shorten the specification of a large factorial model by using the bar operator. For example, two ways of writing the model for a full three-way factorial model follow: model Y = A B C model Y = A|B|C;
A*B A*C B*C
A*B*C;
Missing Values F 533
When the bar (|) is used, the right and left sides become effects, and the cross of them becomes an effect. Multiple bars are permitted. The expressions are expanded from left to right, using rules 2–4 given in Searle (1971, p. 390). Multiple bars are evaluated from left to right. For instance, A|B|C is evaluated as follows: A|B|C
!
fA|Bg|C
!
f A B A*B g | C
!
A B A*B C A*C B*C A*B*C
Crossed and nested groups of variables are combined. For example, A(B) | C(D) generates A*C(B D), among other terms. Duplicate variables are removed. For example, A(C) | B(C) generates A*B(C C), among other terms, and the extra C is removed. Effects are discarded if a variable occurs on both the crossed and nested parts of an effect. For instance, A(B) | B(D E) generates A*B(B D E), but this effect is eliminated immediately. You can also specify the maximum number of variables involved in any effect that results from bar evaluation by specifying that maximum number, preceded by an @ sign, at the end of the bar effect. For example, the specification A | B | C@2 would result in only those effects that contain two or fewer variables: in this case, A B A*B C A*C and B*C. More examples of using the | and @ operators follow: A | C(B)
is equivalent to
A C(B) A*C(B)
A(B) | C(B)
is equivalent to
A(B) C(B) A*C(B)
A(B) | B(D E)
is equivalent to
A(B) B(D E)
A | B(A) | C
is equivalent to
A B(A) C A*C B*C(A)
A | B(A) | C@2
is equivalent to
A B(A) C A*C
A | B | C | D@2
is equivalent to
A B A*B C A*C B*C D A*D B*D C*D
A*B(C*D)
is equivalent to
A*B(C D)
Missing Values Any observation in the input data set with a missing value for one or more of the regressors is ignored by PROC COUNTREG and not used in the model fit. PROC COUNTREG rounds any positive noninteger count values to the nearest integer. PROC COUNTREG ignores any observations with a negative count, a zero or negative weight, or a frequency less than 1. If there are observations in the input data set with missing response values but with nonmissing regressors, PROC COUNTREG can compute several statistics and store them in an output data set by using the OUTPUT statement. For example, you can request that the output data set contain the
534 F Chapter 10: The COUNTREG Procedure
estimates of x0i ˇ, the expected value of the response variable, and the probability of the response variable taking on values that you specify. In a zero-inflated model, you can additionally request that the output data set contain the estimates of z0i , and the probability that the response is zero as a result of the zero-generating process. The presence of such observations (with missing response values) does not affect the model fit.
Poisson Regression The most widely used model for count data analysis is Poisson regression. This assumes that yi , given the vector of covariates xi , is independently Poisson-distributed with P .Yi D yi jxi / D
e
i yi i
yi Š
;
yi D 0; 1; 2; : : :
and the mean parameter (that is, the mean number of events per period) is given by i D exp.x0i ˇ/ where ˇ is a .k C 1/ 1 parameter vector. (The intercept is ˇ0 ; the coefficients for the k regressors are ˇ1 ; : : : ; ˇk .) Taking the exponential of x0i ˇ ensures that the mean parameter i is nonnegative. It can be shown that the conditional mean is given by E.yi jxi / D i D exp.x0i ˇ/ The name log-linear model is also used for the Poisson regression model since the logarithm of the conditional mean is linear in the parameters: lnŒE.yi jxi / D ln.i / D x0i ˇ Note that the conditional variance of the count random variable is equal to the conditional mean in the Poisson regression model: V .yi jxi / D E.yi jxi / D i The equality of the conditional mean and variance of yi is known as equidispersion. The marginal effect of a regressor is given by @E.yi jxi / D exp.x0i ˇ/ˇj D E.yi jxi /ˇj @xj i Thus, a one-unit change in the j th regressor leads to a proportional change in the conditional mean E.yi jxi / of ˇj . The standard estimator for the Poisson model is the maximum likelihood estimator (MLE). Since the observations are independent, the log-likelihood function is written as LD
N X
wi . i C yi ln i
i D1
where wi is defined as follows:
ln yi Š/ D
N X i D1
0
wi . e xi ˇ C yi x0i ˇ
ln yi Š/
Negative Binomial Regression F 535
1
if neither the WEIGHT nor the FREQ statement is used.
Wi
where Wi are the nonnormalized values of the variable specified in the WEIGHT statement in which the NONORMALIZE option is specified.
Pn n
i D1
Wi
Wi
where Wi are the nonnormalized values of the variable specified in the WEIGHT statement.
Fi
where Fi are the values of the variable specified in the FREQ statement.
Wi Fi
if both the WEIGHT statement, without the NONORMALIZE option, and the FREQ statement are specified.
Pn F Pn iD1 i Wi Fi F i D1 i Wi
if both the FREQ and the WEIGHT statements are specified.
The gradient and the Hessian are, respectively, N
X @L D wi .yi @ˇ
i /xi D
i D1
@2 L D @ˇ@ˇ 0
N X
N X
0
wi .yi
e xi ˇ /xi
N X
0
iD1
wi i xi xi 0 D
i D1
wi e xi ˇ xi x0i
i D1
The Poisson model has been criticized for its restrictive property that the conditional variance equals the conditional mean. Real-life data are often characterized by overdispersion (that is, the variance exceeds the mean). Allowing for overdispersion can improve model predictions since the Poisson restriction of equal mean and variance results in the underprediction of zeros when overdispersion exists. The most commonly used model that accounts for overdispersion is the negative binomial model.
Negative Binomial Regression The Poisson regression model can be generalized by introducing an unobserved heterogeneity term for observation i . Thus, the individuals are assumed to differ randomly in a manner that is not fully accounted for by the observed covariates. This is formulated as 0
E.yi jxi ; i / D i i D e xi ˇCi where the unobserved heterogeneity term i D e i is independent of the vector of regressors xi . Then the distribution of yi conditional on xi and i is Poisson with conditional mean and conditional variance i i : f .yi jxi ; i / D
exp. i i /.i i /yi yi Š
Let g.i / be the probability density function of i . Then, the distribution f .yi jxi / (no longer conditional on i ) is obtained by integrating f .yi jxi ; i / with respect to i : Z 1 f .yi jxi / D f .yi jxi ; i /g.i /d i 0
536 F Chapter 10: The COUNTREG Procedure
An analytical solution to this integral exists when i is assumed to follow a gamma distribution. This solution is the negative binomial distribution. When the model contains a constant term, it is necessary to assume that E.e i / D E.i / D 1, in order to identify the mean of the distribution. Thus, it is assumed that i follows a gamma(; ) distribution with E.i / D 1 and V .i / D 1=, 1 exp. i / . / i R1 where .x/ D 0 z x 1 exp. z/dz is the gamma function and is a positive parameter. Then, the density of yi given xi is derived as g.i / D
1
Z f .yi jxi / D
f .yi jxi ; i /g.i /d i
0
y
D
i i yi Š. /
1
Z
e 0
.i C /i Cyi 1 i d i
y
D D
i i .yi C / yi Š. /. C i /Cyi yi .yi C / i yi Š. / C i C i
Making the substitution ˛ D
1
(˛ > 0), the negative binomial distribution can then be rewritten as
.yi C ˛ 1 / f .yi jxi / D yi Š.˛ 1 /
˛
1
˛
1
˛
1
C i
i 1 ˛ C i
yi yi D 0; 1; 2; : : :
;
Thus, the negative binomial distribution is derived as a gamma mixture of Poisson random variables. It has conditional mean 0
E.yi jxi / D i D e xi ˇ and conditional variance 1 i D i Œ1 C ˛i > E.yi jxi / The conditional variance of the negative binomial distribution exceeds the conditional mean. Overdispersion results from neglected unobserved heterogeneity. The negative binomial model with variance function V .yi jxi / D i C ˛2i , which is quadratic in the mean, is referred to as the NEGBIN2 model (Cameron and Trivedi 1986). To estimate this model, specify DIST=NEGBIN(p=2) in the MODEL statement. The Poisson distribution is a special case of the negative binomial distribution where ˛ D 0. A test of the Poisson distribution can be carried out by testing the hypothesis that ˛ D 1 D 0. A Wald test of this hypothesis is provided (it is the reported t statistic for the estimated i ˛ in the negative binomial model). V .yi jxi / D i Œ1 C
The log-likelihood function of the negative binomial regression model (NEGBIN2) is given by
L D
N X i D1
wi
(y 1 i X
ln.j C ˛
1
/
ln.yi Š/
j D0
) .yi C ˛
1
/ ln.1 C
˛ exp.x0i ˇ//
C yi ln.˛/ C
yi x0i ˇ
Zero-Inflated Count Regression Overview F 537
.y C a/= .a/ D
yY1
.j C a/
j D0
if y is an integer. See “Poisson Regression” on page 534 for the definition of wi . The gradient is N
X yi i @L D xi wi @ˇ 1 C ˛i i D1
and 8 N < X @L D wi ˛ : @˛
2
i D1
yX i 1 j D0
1 .j C ˛
9 = y i i 2 C ˛ ln.1 C ˛ / C i 1/ ˛.1 C ˛i / ;
Cameron and Trivedi (1986) consider a general class of negative binomial models with mean i p and variance function i C ˛i . The NEGBIN2 model, with p D 2, is the standard formulation of the negative binomial model. Models with other values of p, 1 < p < 1, have the same density f .yi jxi / except that ˛ 1 is replaced everywhere by ˛ 1 2 p . The negative binomial model NEGBIN1, which sets p D 1, has variance function V .yi jxi / D i C ˛i , which is linear in the mean. To estimate this model, specify DIST=NEGBIN(p=1) in the MODEL statement. The log-likelihood function of the NEGBIN1 regression model is given by
L D
N X
wi
(y 1 i X
i D1
ln j C ˛
1
exp.x0i ˇ/
j D0
) ln.yi Š/
yi C ˛
1
exp.x0i ˇ/
ln.1 C ˛/ C yi ln.˛/
See “Poisson Regression” on page 534 for the definition of wi . The gradient is @L D @ˇ
N X i D1
80 1 i 1 < yX i A xi wi @ : .j˛ C i /
˛
1
ln.1 C ˛/i xi
9 = ;
j D0
and @L D @˛
N X i D1
8 0 1 yX i 1 < 1 ˛ i A @ wi : .j˛ C i /
˛
2
i ln.1 C ˛/
j D0
1 / i
.yi C ˛ 1C˛
9 yi = C ˛;
Zero-Inflated Count Regression Overview The main motivation for zero-inflated count models is that real-life data frequently display overdispersion and excess zeros. Zero-inflated count models provide a way of modeling the excess zeros in
538 F Chapter 10: The COUNTREG Procedure
addition to allowing for overdispersion. In particular, for each observation, there are two possible data generation processes. The result of a Bernoulli trial is used to determine which of the two processes is used. For observation i , Process 1 is chosen with probability 'i and Process 2 with probability 1 'i . Process 1 generates only zero counts. Process 2 generates counts from either a Poisson or a negative binomial model. In general, 0 with probability 'i yi g.yi / with probability 1 'i Therefore, the probability of fYi D yi g can be described as P .yi D 0jxi / D 'i C .1 P .yi jxi / D .1
'i /g.0/
'i /g.yi /;
yi > 0
where g.yi / follows either the Poisson or the negative binomial distribution. You can specify the probability ' with the PROBZERO= option in the OUTPUT statement. When the probability 'i depends on the characteristics of observation i , 'i is written as a function of z0i , where z0i is the 1 .q C 1/ vector of zero-inflation covariates and is the .q C 1/ 1 vector of zero-inflation coefficients to be estimated. (The zero-inflation intercept is 0 ; the coefficients for the q zero-inflation covariates are 1 ; : : : ; q .) The function F that relates the product z0i (which is a scalar) to the probability 'i is called the zero-inflation link function, 'i D Fi D F .z0i / In the COUNTREG procedure, the zero-inflation covariates are indicated in the ZEROMODEL statement. Furthermore, the zero-inflation link function F can be specified as either the logistic function, F .z0i /
D
ƒ.z0i /
exp.z0i / D 1 C exp.z0i /
or the standard normal cumulative distribution function (also called the probit function), F .z0i / D ˆ.z0i / D
z0i
Z 0
1 p exp. u2 =2/du 2
The zero-inflation link function is indicated in the LINK option in ZEROMODEL statement. The default ZI link function is the logistic function.
Zero-Inflated Poisson Regression In the zero-inflated Poisson (ZIP) regression model, the data generation process referred to earlier as Process 2 is y
g.yi / D
exp. i /i i yi Š
Zero-Inflated Poisson Regression F 539
0
where i D e xi ˇ . Thus the ZIP model is defined as P .yi D 0jxi ; zi / D Fi C .1 P .yi jxi ; zi / D .1
Fi / exp. i / y exp. i /i i Fi / ; yi > 0 yi Š
The conditional expectation and conditional variance of yi are given by E.yi jxi ; zi / D i .1
Fi /
V .yi jxi ; zi / D E.yi jxi ; zi /.1 C i Fi / Note that the ZIP model (as well as the ZINB model) exhibits overdispersion since V .yi jxi ; zi / > E.yi jxi ; zi /. In general, the log-likelihood function of the ZIP model is LD
N X
wi ln ŒP .yi jxi ; zi /
i D1
After a specific link function (either logistic or standard normal) for the probability 'i is chosen, it is possible to write the exact expressions for the log-likelihood function and the gradient.
ZIP Model with Logistic Link Function First, consider the ZIP model in which the probability 'i is expressed with a logistic link function— namely, 'i D
exp.z0i / 1 C exp.z0i /
The log-likelihood function is L D
X
wi ln exp.z0i / C exp. exp.x0i ˇ//
fi Wyi D0g
" X
C
wi yi x0i ˇ
exp.x0i ˇ/
fiWyi >0g N X
yi X
# ln.k/
kD2
wi ln 1 C exp.z0i /
i D1
See “Poisson Regression” on page 534 for the definition of wi . The gradient for this model is given by @L D @
X fi Wyi D0g
wi
exp.z0i / zi exp.z0i / C exp. exp.x0i ˇ//
N X i D1
wi
exp.z0i / zi 1 C exp.z0i /
540 F Chapter 10: The COUNTREG Procedure
@L D @ˇ
X
wi
fi Wyi D0g
exp.x0i ˇ/ exp. exp.x0i ˇ// xi C exp.z0i / C exp. exp.x0i ˇ//
X
wi yi
exp.x0i ˇ/ xi
fiWyi >0g
ZIP Model with Standard Normal Link Function Next, consider the ZIP model in which the probability 'i is expressed with a standard normal link function: 'i D ˆ.z0i /. The log-likelihood function is L D
X
˚ wi ln ˆ.z0i / C 1
ˆ.z0i / exp. exp.x0i ˇ//
fi Wyi D0g
( C
X
wi
ˆ.z0i /
ln 1
exp.x0i ˇ/ C yi x0i ˇ
fiWyi >0g
yi X
) ln.k/
kD2
See “Poisson Regression” on page 534 for the definition of wi . The gradient for this model is given by @L @
D
X fiWyi D0g
X fiWyi >0g
@L @ˇ
'.z0i / 1 exp. exp.x0i ˇ// wi zi ˆ.z0i / C 1 ˆ.z0i / exp. exp.x0i ˇ// wi 1
'.z0i /
zi ˆ.z0i /
ˆ.z0i / exp.x0i ˇ/ exp. exp.x0i ˇ// xi D wi ˆ.z0i / C 1 ˆ.z0i / exp. exp.x0i ˇ// fiWyi D0g X C wi yi exp.x0i ˇ/ xi X
1
fiWyi >0g
Zero-Inflated Negative Binomial Regression The zero-inflated negative binomial (ZINB) model in PROC COUNTREG is based on the negative binomial model with quadratic variance function (p=2). The ZINB model is obtained by specifying a negative binomial distribution for the data generation process referred to earlier as Process 2: .yi C ˛ 1 / g.yi / D yi Š.˛ 1 /
˛ ˛
1
1
C i
˛
1
i 1 ˛ C i
yi
Zero-Inflated Negative Binomial Regression F 541
Thus the ZINB model is defined to be 1
Fi / .1 C ˛i / ˛ ˛ .yi C ˛ 1 / ˛ 1 P .yi jxi ; zi / D .1 Fi / yi Š.˛ 1 / ˛ 1 C i yi i ; yi > 0 1 ˛ C i
P .yi D 0jxi ; zi / D Fi C .1
1
In this case, the conditional expectation and conditional variance of yi are E.yi jxi ; zi / D i .1
Fi /
V .yi jxi ; zi / D E.yi jxi ; zi / Œ1 C i .Fi C ˛/ As with the ZIP model, the ZINB model exhibits overdispersion because the conditional variance exceeds the conditional mean.
ZINB Model with Logistic Link Function In this model, the probability 'i is given by the logistic function—namely, 'i D
exp.z0i / 1 C exp.z0i /
The log-likelihood function is L D
h wi ln exp.z0i / C .1 C ˛ exp.x0i ˇ//
X
˛
1
i
fi Wyi D0g
C
X
wi
X
ln.j C ˛
1
/
j D0
fi Wyi >0g
C
yX i 1
wi
˚
ln.yi Š/
.yi C ˛
1
/ ln.1 C ˛ exp.x0i ˇ// C yi ln.˛/ C yi x0i ˇ
fiWyi >0g N X
wi ln 1 C exp.z0i /
i D1
See “Poisson Regression” on page 534 for the definition of wi . The gradient for this model is given by @L @
" D
X
wi
fi Wyi D0g N X i D1
wi
#
exp.z0i / exp.z0i / C .1 C ˛ exp.x0i ˇ//
exp.z0i / zi 1 C exp.z0i /
˛
1
zi
542 F Chapter 10: The COUNTREG Procedure
@L @ˇ
" X
D
exp.x0i ˇ/.1 C ˛ exp.x0i ˇ//
wi
˛
exp.z0i / C .1 C ˛ exp.x0i ˇ// X yi exp.x0i ˇ/ wi xi 1 C ˛ exp.x0i ˇ/
1
˛
1
# xi
1
fiWyi D0g
C
fiWyi >0g
@L D @˛
C
X
wi
˛
fi Wyi D0g
X
wi
fi Wyi >0g
8
0g
X
C
wi
yX i 1
˚ ln.j C ˛
fi Wyi >0g
j D0
X
wi ln.yi Š/
1
/
fi Wyi >0g
X
wi .yi C ˛
1
/ ln.1 C ˛ exp.x0i ˇ//
fi Wyi >0g
X
C
wi yi ln.˛/
fi Wyi >0g
X
C
wi yi x0i ˇ
fi Wyi >0g
See “Poisson Regression” on page 534 for the definition of wi . The gradient for this model is given by h i 2 0 0 ˛ 1 '.z
/ 1 .1 C ˛ exp.x ˇ// X i i @L wi 4 D 0 0 @ ˆ.zi / C 1 ˆ.zi / .1 C ˛ exp.x0i ˇ// fi Wyi D0g
3 ˛
1
5 zi
˛
1
o
Computational Resources F 543
X
wi
fiWyi >0g
@L D @ˇ
X
ˆ.z0i / exp.x0i ˇ/.1 C ˛ exp.x0i ˇ// .1C˛/=˛ xi 1 ˆ.z0i / C 1 ˆ.z0i / .1 C ˛ exp.x0i ˇ// ˛
1
wi
fi Wyi D0g
X
C
wi
fi Wyi >0g
@L D @˛
C
'.z0i / zi 1 ˆ.z0i /
X
wi
1
fi Wyi D0g
X fi Wyi >0g
wi
8 < :
yi exp.x0i ˇ/ xi 1 C ˛ exp.x0i ˇ/
˛
.1 C ˛ exp.x0i ˇ// ln.1 C ˛ exp.x0i ˇ// ˛ exp.x0i ˇ/ ˆ.z0i /.1 C ˛ exp.x0i ˇ//.1C˛/=˛ C 1 ˆ.z0i / .1 C ˛ exp.x0i ˇ//
2
ˆ.z0i / ˛
yX i 1 j D0
1 .j C ˛
2
1/
C˛
2
ln.1 C ˛ exp.x0i ˇ// C
yi ˛.1 C
9
exp.x0i ˇ/ = ˛ exp.x0i ˇ// ;
Computational Resources The time and memory required by PROC COUNTREG are proportional to the number of parameters in the model and the number of observations in the data set being analyzed. Less time and memory are required for smaller models and fewer observations. Also affecting these resources are the method chosen to calculate the variance-covariance matrix and the optimization method. All optimization methods available through the METHOD= option have similar memory use requirements. The processing time might differ for each method depending on the number of iterations and functional calls needed. The data set is read into memory to save processing time. If not enough memory is available to hold the data, the COUNTREG procedure stores the data in a utility file on disk and rereads the data as needed from this file. When this occurs, the execution time of the procedure increases substantially. The gradient and the variance-covariance matrix must be held in memory. If the model has p parameters including the intercept, then at least 8 .p C p .p C 1/=2/ bytes are needed. If the quasi-maximum likelihood method is used to estimate the variance-covariance matrix (COVEST=QML), an additional 8 p .p C 1/=2 bytes of memory are needed. Time is also a function of the number of iterations needed to converge to a solution for the model parameters. The number of iterations needed cannot be known in advance. The MAXITER= option can be used to limit the number of iterations that PROC COUNTREG does. The convergence criteria can be altered by nonlinear optimization options available in the PROC COUNTREG statement. For a list of all the nonlinear optimization options, see Chapter 6, “Nonlinear Optimization Methods.”
544 F Chapter 10: The COUNTREG Procedure
Nonlinear Optimization Options PROC COUNTREG uses the nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks. In the PROC COUNTREG statement, you can specify nonlinear optimization options that are then passed to the NLO subsystem. For a list of all the nonlinear optimization options, see Chapter 6, “Nonlinear Optimization Methods.”
Covariance Matrix Types The COUNTREG procedure enables you to specify the estimation method for the covariance matrix. The COVEST=HESSIAN option estimates the covariance matrix based on the inverse of the Hessian matrix, COVEST=OP uses the outer product of gradients, and COVEST=QML produces the covariance matrix based on both the Hessian and outer product matrices. The default is COVEST=HESSIAN. While all three methods produce asymptotically equivalent results, they differ in computational intensity and produce results that might differ in finite samples. The COVEST=OP option provides the covariance matrix that is typically the easiest to compute. In some cases, the OP approximation is considered more efficient than the Hessian or QML approximations because it contains fewer random elements. The QML approximation is computationally the most complex because both the outer product of gradients and the Hessian matrix are required. In most cases, OP or Hessian approximations are preferred to QML. The need to use QML approximation arises in some cases when the model is misspecified and the information matrix equality does not hold.
Displayed Output PROC COUNTREG produces the following displayed output.
Iteration History for Parameter Estimates If you specify the ITPRINT or PRINTALL options in the PROC COUNTREG statement, PROC COUNTREG displays a table that contains the following information for each iteration. Note that some information is specific to the model-fitting procedure chosen (for example, Newton-Raphson, trust region, quasi-Newton). iteration number number of restarts since the fitting began number of function calls number of active constraints at the current solution
Displayed Output F 545
value of the objective function (–1 times the log-likelihood value) at the current solution change in the objective function from previous iteration value of the maximum absolute gradient element step size (for Newton-Raphson and quasi-Newton methods) slope of the current search direction (for Newton-Raphson and quasi-Newton methods) lambda (for trust region method) radius value at current iteration (for trust region method)
Model Fit Summary The “Model Fit Summary” table contains the following information: dependent (count) variable name number of observations used number of missing values in data set, if any data set name type of model that was fit offset variable name, if any zero-inflated link function, if any zero-inflated offset variable name, if any log-likelihood value at solution maximum absolute gradient at solution number of iterations AIC value at solution (a smaller value indicates better fit) SBC value at solution (a smaller value indicates better fit) Under the “Model Fit Summary” is a statement about whether the algorithm successfully converged.
546 F Chapter 10: The COUNTREG Procedure
Parameter Estimates The “Parameter Estimates” table gives the estimates of the model parameters. In zero-inflated (ZI) models, estimates are also given for the ZI intercept and ZI regressor parameters labeled with the prefix “Inf_”. For example, the ZI intercept is labeled “Inf_intercept”. If you specify “Age” as a ZI regressor, then the “Parameter Estimates” table labels the corresponding parameter estimate “Inf_Age”. If you do not list any ZI regressors, then only the ZI intercept term is estimated. “_Alpha” is the negative binomial dispersion parameter. The t statistic given for “_Alpha” is a test of overdispersion.
Last Evaluation of the Gradient If you specify the model option ITPRINT, the COUNTREG procedure displays the last evaluation of the gradient vector.
Covariance of Parameter Estimates If you specify the COVB option in the MODEL statement or in the PROC COUNTREG statement, the COUNTREG procedure displays the estimated covariance matrix, defined as the inverse of the information matrix at the final iteration.
Correlation of Parameter Estimates If you specify the CORRB option in the MODEL statement or in the PROC COUNTREG statement, PROC COUNTREG displays the estimated correlation matrix. It is based on the Hessian matrix used at the final iteration.
OUTPUT OUT= Data Set The OUTPUT statement creates a new SAS data set that contains all the variables in the input data set and, optionally, the estimates of x0i ˇ, the expected value of the response variable, and the probability of the response variable taking on the current value or other values that you specify. In a zero-inflated model you can additionally request that the output data set contain the estimates of z0i , and the probability that the response is zero as a result of the zero-generating process. Except for the probability of the current value, these statistics can be computed for all observations in which the regressors are not missing, even if the response is missing. By adding observations with missing response values to the input data set, you can compute these statistics for new observations or for settings of the regressors not present in the data without affecting the model fit.
ODS Table Names F 547
OUTEST= Data Set The OUTEST= data set is made up of one row (with _TYPE_=‘PARM’) that contains each of the parameter estimates in the model. The second row (with _TYPE_=‘STD’) contains the standard errors for the parameter estimates in the model. If you use the COVOUT option in the PROC COUNTREG statement, the OUTEST= data set also contains the covariance matrix for the parameter estimates. The covariance matrix appears in the observations with _TYPE_=‘COV’, and the _NAME_ variable labels the rows with the parameter names. The names of the parameters are used as variable names. These are the same names as used in the INIT, BOUNDS, and RESTRICT statements.
ODS Table Names PROC COUNTREG assigns a name to each table it creates. You can use these names to denote the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in Table 10.2. Table 10.2
ODS Tables Produced in PROC COUNTREG
ODS Table Name
Description
ODS Tables Created by the MODEL Statement ClassLevels Class levels FitSummary Summary of nonlinear estimation ConvergenceStatus Convergence status ParameterEstimates Parameter estimates CovB Covariance of parameter estimates CorrB Correlation of parameter estimates InputOptions Input options IterStart Optimization start IterHist Iteration history IterStop Optimization results ParameterEstimatesResults Parameter estimates ParameterEstimatesStart Parameter estimates ProblemDescription Problem description
Option Default Default Default Default COVB CORRB ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT
548 F Chapter 10: The COUNTREG Procedure
Examples: COUNTREG Procedure
Example 10.1: Basic Models Data Description and Objective The data set docvisit contains information for approximately 5,000 Australian individuals about the number and possible determinants of doctor visits that were made during a two-week interval. This data set contains a subset of variables taken from the Racd3 data set used by Cameron and Trivedi (1998). The docvisit data set can be found in the SAS/ETS Sample Library. The variable doctorco represents doctor visits. Additional variables in the data set that you want to evaluate as determinants of doctor visits include sex (coded 0=male, 1=female), age (age in years divided by 100), illness (number of illnesses during the two-week interval, with five or more coded as five), income (annual income in Australian dollars divided by 1,000), and hscore (a general health questionnaire score, where a high score indicates bad health). Summary statistics for these variables are computed in the following statements and presented in Output 10.1.1. proc means data=docvisit; var doctorco sex age illness income hscore; run;
Output 10.1.1 Summary Statistics The MEANS Procedure Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------------doctorco 5190 0.3017341 0.7981338 0 9.0000000 sex 5190 0.5206166 0.4996229 0 1.0000000 age 5190 0.4063854 0.2047818 0.1900000 0.7200000 illness 5190 1.4319846 1.3841524 0 5.0000000 income 5190 0.5831599 0.3689067 0 1.5000000 hscore 5190 1.2175337 2.1242665 0 12.0000000 --------------------------------------------------------------------------------
Poisson Model The following statements fit a Poisson model to the data by using the covariates SEX, ILLNESS, INCOME, and HSCORE: proc countreg data=docvisit; model doctorco=sex illness income hscore / dist=poisson printall; run;
Example 10.1: Basic Models F 549
In this example, the DIST= option in the MODEL statement specifies the POISSON distribution. In addition, the PRINTALL option displays the correlation and covariance matrices for the parameters, log-likelihood values, and convergence information in addition to the parameter estimates. The parameter estimates for this model are shown in Output 10.1.2. Output 10.1.2 Parameter Estimates of Poisson Model The COUNTREG Procedure Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Approx Pr > |t|
Intercept sex illness income hscore
1 1 1 1 1
-1.855552 0.235583 0.270326 -0.242095 0.096313
0.074545 0.054362 0.017080 0.077829 0.009089
-24.89 4.33 15.83 -3.11 10.60
|t| = 49 ) ); var mode decision p ttime; id pid; run;
The last nine observations from the forecast data set (probdata ) are displayed in Figure 17.7. It is expected that the decision maker will choose mode “1” based on predicted probabilities for all modes. Figure 17.7 Out-of-Sample Mode Choice Forecast pid 49 49 49 50 50 50 51 51 51
mode 1 2 3 1 2 3 1 2 3
decision
p
0 1 0 0 1 0 . . .
0.46393 0.41753 0.11853 0.06936 0.92437 0.00627 0.93611 0.02630 0.03759
ttime 11.852 12.147 15.672 15.557 8.307 22.286 5.000 15.000 14.000
Nested Logit Modeling A more general model can be specified using the nested logit model. Consider, for example, the following random utility function: Uij D xij ˇ C ij j D 1; : : : ; 3 Suppose the set of all alternatives indexed by j is partitioned into K nests, B1 ; : : : ; BK . The nested logit model is obtained by assuming that the error term in the utility function has the GEV cumulative
Nested Logit Modeling F 921
distribution function 0 1k 1 0 K C B X@X exp @ expf ij =k gA A kD1
j 2Bk
where k is a measure of a degree of independence among the alternatives in nest k. When k D 1 for all k, the model reduces to the standard logit model. Since the public transportation modes, 1 and 2, tend to be correlated, these two choices can be grouped together. The decision tree displayed in Figure 17.8 is constructed. Figure 17.8 Decision Tree for Model Choice
The two-level decision tree is specified in the NEST statement. The NCHOICE= option is not allowed for nested logit estimation. Instead, the CHOICE= option needs to be specified, as in the following statements: /*-- nested logit estimation --*/ proc mdc data=newdata; model decision = ttime / type=nlogit choice=(mode 1 2 3) covest=hess; id pid; utility u(1,) = ttime; nest level(1) = (1 2 @ 1, 3 @ 2), level(2) = (1 2 @ 1); run;
In Figure 17.9, estimates of the inclusive value parameters, INC_L2G1C1 and INC_L2G1C2, are indicative of a nested model structure. See the section “Nested Logit” on page 956 and the section “Decision Tree and Nested Logit” on page 958 for more details about inclusive values.
922 F Chapter 17: The MDC Procedure
Figure 17.9 Two-Level Nested Logit Estimates The MDC Procedure Nested Logit Estimates Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
ttime_L1 INC_L2G1C1 INC_L2G1C2
1 1 1
-0.4040 0.8016 0.8087
0.1241 0.4352 0.3591
-3.25 1.84 2.25
Approx Pr > |t| 0.0011 0.0655 0.0243
The nested logit model is estimated with the restriction INC_L2G1C1 = INC_L2G1C2 by specifying the SAMESCALE option, as in the following statements: /*-- nlogit with samescale option --*/ proc mdc data=newdata; model decision = ttime / type=nlogit choice=(mode 1 2 3) samescale covest=hess; id pid; utility u(1,) = ttime; nest level(1) = (1 2 @ 1, 3 @ 2), level(2) = (1 2 @ 1); run;
The estimation result is displayed in Figure 17.10. Figure 17.10 Nested Logit Estimates with One Dissimilarity Parameter The MDC Procedure Nested Logit Estimates Parameter Estimates
Parameter ttime_L1 INC_L2G1
DF
Estimate
Standard Error
t Value
1 1
-0.4025 0.8209
0.1217 0.3019
-3.31 2.72
Approx Pr > |t| 0.0009 0.0066
The nested logit model is equivalent to the conditional logit model if INC_L2G1C1 = INC_L2G1C2 = 1. You can verify this relationship by estimating a constrained nested logit model as shown in the following statements. (See the section “RESTRICT Statement” on page 946 for details about imposing linear restrictions on parameter estimates.)
Nested Logit Modeling F 923
/*-- constrained nested logit estimation --*/ proc mdc data=newdata; model decision = ttime / type=nlogit choice=(mode 1 2 3) covest=hess; id pid; utility u(1,) = ttime; nest level(1) = (1 2 @ 1, 3 @ 2), level(2) = (1 2 @ 1); restrict INC_L2G1C1 = 1, INC_L2G1C2 =1; run;
The parameter estimates and the active linear constraints for the constrained nested logit model are displayed in Figure 17.11. Figure 17.11 Constrained Nested Logit Estimates The MDC Procedure Nested Logit Estimates Parameter Estimates
Parameter
DF
Estimate
Standard Error
ttime_L1 INC_L2G1C1 INC_L2G1C2 Restrict1 Restrict2
1 0 0 1 1
-0.3572 1.0000 1.0000 -2.1706 3.6573
0.0776 0 0 8.4098 10.0001
t Value
Approx Pr > |t|
-4.60
|t| |t| 0.0143 0.0741 0.0366
The parameters SCALE2 and SCALE3 in the output correspond to the estimates of the scale parameters 2 and 3 , respectively. Note that the estimate of the HEV model is not always stable because computation of the loglikelihood function requires numerical integration. Bhat (1995) proposed the Gauss-Laguerre method. In general, the log-likelihood function value of HEV should be larger than that of conditional logit because HEV models include the conditional logit as a special case. However, in this example the reverse is true (–33.414 for the HEV model, which is less than –33.321 for the conditional logit model). (See Figure 17.14 and Figure 17.3.) This indicates that the Gauss-Laguerre approximation to the true probability is too coarse. You can see how well the Gauss-Laguerre method works by specifying a unit scale restriction for all modes, as in the following statements, since the HEV model with the unit variance for all modes reduces to the conditional logit model:
HEV and Multinomial Probit: Heteroscedastic Utility Function F 927
/*-- hev with gauss-laguerre and unit scale --*/ proc mdc data=newdata; model decision = ttime / type=hev nchoice=3 hev=(unitscale=1 2 3, integrate=laguerre) covest=hess; id pid; run;
Figure 17.16 shows that the ttime coefficient is not close to that of the conditional logit model. Figure 17.16 HEV Estimates with All Unit Scale Parameters The MDC Procedure Heteroscedastic Extreme Value Model Estimates Parameter Estimates
Parameter ttime
DF
Estimate
Standard Error
t Value
1
-0.2926
0.0438
-6.68
Approx Pr > |t| |t| 0.0139 0.0701 0.0412
With the INTEGRATE=HARDY option, the log-likelihood function value of the HEV model, 33:026, is greater than that of the conditional logit model, 33:321. (See Figure 17.17 and Figure 17.3.) When you impose unit scale restrictions on all choices, as in the following statements, the HEV model gives the same estimates as the conditional logit model. (See Figure 17.19 and Figure 17.6.) /*-- hev with adaptive integration and unit scale --*/ proc mdc data=newdata; model decision = ttime / type=hev nchoice=3 hev=(unitscale=1 2 3, integrate=hardy) covest=hess; id pid; run;
HEV and Multinomial Probit: Heteroscedastic Utility Function F 929
Figure 17.19 Alternative HEV Estimates with Unit Scale Restrictions The MDC Procedure Heteroscedastic Extreme Value Model Estimates Parameter Estimates
Parameter ttime
DF
Estimate
Standard Error
t Value
1
-0.3572
0.0776
-4.60
Approx Pr > |t| |t|
-3.49 2.45
0.0005 0.0143
0.77
0.4499*
Parameter Estimates Parameter ttime STD_3 RHO_31 Restrict1
Parameter Label
Linear EC [ 1 ]
* Probability computed using beta distribution.
Note that in the output the estimates of standard errors and correlations are denoted by STD_i and RHO_ij, respectively. In this particular case the first two variances (STD_1 and STD_2) are normalized to one, and corresponding correlations (RHO_21 and RHO_32) are set to zero, so they are not listed among parameter estimates.
Parameter Heterogeneity: Mixed Logit One way of modeling unobserved heterogeneity across individuals in their sensitivity to observed exogenous variables is to use the mixed logit model with a random parameters or random coefficients specification. The probability of choosing alternative j is written as exp.x0ij ˇ/ Pi .j / D PJ 0 kD1 exp.xik ˇ/ where ˇ is a vector of coefficients that varies across individuals and xij is a vector of exogenous attributes. For example, you can specify the distribution of the parameter ˇ to be the normal distribution. The mixed logit model uses a Monte Carlo simulation method to estimate the probabilities of choice. There are two simulation methods available. If the RANDNUM=PSEUDO option is specified in the MODEL statement, pseudo-random numbers are generated; if the RANDNUM=HALTON option is specified, Halton quasi-random sequences are used. The default value is RANDNUM=HALTON.
Parameter Heterogeneity: Mixed Logit F 931
You can estimate the model with normally distributed random coefficients of ttime with the following SAS statements: /*-- mixed logit estimation --*/ proc mdc data=newdata type=mixedlogit; model decision = ttime / nchoice=3 mixed=(normalparm=ttime); id pid; run;
Let ˇ m and ˇ s be mean and scale parameters, respectively, for the random coefficient, ˇ. The relevant utility function is Uij D ttimeij ˇ C ij where ˇ D ˇ m C ˇ s (ˇ m and ˇ s are fixed mean and scale parameters, respectively). The stochastic component, , is assumed to be standard normal since the NORMALPARM= option is given. Alternatively, the UNIFORMPARM= or LOGNORMALPARM= option can be specified. The LOGNORMALPARM= option is useful when nonnegative parameters are being estimated. The NORMALPARM=, UNIFORMPARM=, and LOGNORMALPARM= variables must be included in the right-hand side of the MODEL statement. See the section “Mixed Logit Model” on page 953 for more details. To estimate a mixed logit model by using the transportation mode choice data, the MDC procedure requires the MIXED= option for random components. Results of the mixed logit estimation are displayed in Figure 17.21. Figure 17.21 Mixed Logit Model Parameter Estimates The MDC Procedure Mixed Multinomial Logit Estimates Parameter Estimates
Parameter ttime_M ttime_S
DF
Estimate
Standard Error
t Value
1 1
-0.5342 0.2843
0.2184 0.1911
-2.45 1.49
Approx Pr > |t| 0.0144 0.1368
Note that the parameter ttime_M corresponds to the constant mean parameter ˇ m and the parameter ttime_S corresponds to the constant scale parameter ˇ s of the random coefficient ˇ.
932 F Chapter 17: The MDC Procedure
Syntax: MDC Procedure The MDC procedure is controlled by the following statements: PROC MDC options ; MDCDATA options ; BOUNDS bound1 < , bound2 . . . > ; BY variables ; CLASS options ; ID variable ; MODEL dependent variables = regressors / options ; NEST LEVEL(value) = ((values)@(value),. . . , (values)@(value)) ; NLOPTIONS options ; OUTPUT options ; RESTRICT restriction1 < , restriction2 . . . > ; TEST options ; UTILITY U() = variables, . . . , U() = variables ;
Functional Summary Table 17.2 summarizes the statements and options used with the MDC procedure. Table 17.2
MDC Functional Summary
Description Data Set Options Formats the data for use by PROC MDC Specifies the input data set Specifies the output data set for CLASS STATEMENT Writes parameter estimates to an output data set Includes covariances in the OUTEST= data set Writes linear predictors and predicted probabilities to an output data set Declaring the Role of Variables Specifies the ID variable Specifies BY-group processing variables Printing Control Options Requests all printing options Displays correlation matrix of the estimates Displays covariance matrix of the estimates Displays detailed information about optimization iterations
Statement
Option
MDCDATA MDC CLASS
DATA= OUT =
MDC MDC OUTPUT
OUTEST= COVOUT OUT=
ID BY
MODEL MODEL MODEL MODEL
ALL CORRB COVB ITPRINT
Functional Summary F 933
Description
Statement
Option
Suppresses all displayed output
MODEL
NOPRINT
MODEL MODEL MODEL MODEL MODEL MODEL
CHOICE=() CONVERGE= COVEST= HALTONSTART= HEV=() INITIAL=()
MODEL MODEL MODEL MODEL MODEL MODEL MODEL
MAXITER= MIXED=() NCHOICE= NSIMUL= OPTMETHOD= RANDNUM= RANDINIT
MODEL MODEL MODEL MODEL
RANK RESTART=() SAMESCALE SEED=
MODEL
SPSCALE
MODEL MODEL
TYPE= UNITVARIANCE=()
Model Estimation Options Specifies the choice variables Specifies the convergence criterion Specifies the type of covariance matrix Specifies the starting point of the Halton sequence Specifies options specific to the HEV model Sets the initial values of parameters used by the iterative optimization algorithm Specifies the maximum number of iterations Specifies the options specific to mixed logit Specifies the number of choices for each person Specifies the number of simulations Specifies the optimization technique Specifies the type of random number generators Specifies that initial values are generated using random numbers Specifies the rank dependent variable Specifies optimization restart options Specifies a restriction on inclusive parameters Specifies a seed for pseudo-random number generation Specifies a stated preference data restriction on inclusive parameters Specifies the type of the model Specifies normalization restrictions on multinomial probit error variances
Controlling the Optimization Process Specifies upper and lower bounds for the parameter BOUNDS estimates Specifies linear restrictions on the parameter esti- RESTRICT mates Specifies nonlinear optimization options NLOPTIONS Nested Logit Related Options Specifies the tree structure Specifies the type of utility function
NEST UTILITY
LEVEL()= U()=
Output Control Options Outputs predicted probabilities outputs estimated linear predictor
OUTPUT OUTPUT
P= XBETA=
934 F Chapter 17: The MDC Procedure
Description Test Request Options Requests Wald, Lagrange multiplier, and likelihood ratio tests Requests the Wald test Requests the Lagrange multiplier test Requests the likelihood ratio test
Statement
Option
TEST
ALL
TEST TEST TEST
WALD LM LR
PROC MDC Statement PROC MDC options ;
The following options can be used in the PROC MDC statement. DATA=SAS-data-set
specifies the input SAS data set. If the DATA= option is not specified, PROC MDC uses the most recently created SAS data set. OUTEST=SAS-data-set
names the SAS data set that the parameter estimates are written to. See “OUTEST= Data Set” later in this chapter for information about the contents of this data set. COVOUT
writes the covariance matrix for the parameter estimates to the OUTEST= data set. This option is valid only if the OUTEST= option is specified. In addition, any of the following MODEL statement options can be specified in the PROC MDC statement, which is equivalent to specifying the option for the MODEL statement: ALL, CONVERGE=, CORRB, COVB, COVEST=, HALTONSTART=, ITPRINT, MAXITER=, NOPRINT, NSIMUL=, OPTMETHOD=, RANDINIT, RANK, RESTART=, SAMESCALE, SEED=, SPSCALE, TYPE=, and UNITVARIANCE=.
MDCDATA Statement MDCDATA options < / OUT= SAS-data-set > ;
The MDCDATA statement prepares data for use by PROC MDC when the choice-specific information is stored in multiple variables (for example, see Figure 17.1 in the section “Conditional Logit: Estimation and Prediction” on page 915). VARLIST (name1 = (var1 var2
...)
name2 = (var1 var2 . . . )
...)
creates name variables from a multiple-variable list of choice alternatives in parentheses. The
BOUNDS Statement F 935
choice-specific dummy variables are created for the first set of multiple variables. At least one set of multiple variables must be specified. The order of (var1 var2 ...) in the VARLIST option determines the numbering of the alternative; that is, var1 corresponds to alternative 1, var2 corresponds to alternative 2, and so on. SELECT=(variable)
specifies a variable that contains choices for each individual. The SELECT= variable needs to be a character-type variable, with values that match variable names in the first VARLIST option: name1=(var1 var2 ...). ID=(name)
creates a variable that identifies each individual. ALT=(name)
identifies selection alternatives for each individual. DECVAR=(name)
creates a 0/1 variable that indicates the choice made for each individual. OUT=SAS-data-set
specifies a SAS data set to which modified data are output.
BOUNDS Statement BOUNDS bound1 < , bound2 . . . > ;
The BOUNDS statement imposes simple boundary constraints on the parameter estimates. BOUNDS statement constraints refer to the parameters estimated by the MDC procedure. You can specify any number of BOUNDS statements. Each bound is composed of parameters, constants, and inequality operators: item operator item < operator item < operator item . . . > > ;
Each item is a constant, parameter, or list of parameters. Parameters associated with a regressor variable are referred to by the name of the corresponding regressor variable. Each operator is , =. You can use both the BOUNDS statement and the RESTRICT statement to impose boundary constraints; however, the BOUNDS statement provides a simpler syntax for specifying these kinds of constraints. See also the section “RESTRICT Statement” on page 946. Lagrange multipliers are reported for all the active boundary constraints. In the displayed output, the Lagrange multiplier estimates are identified with the names Restrict1, Restrict2, and so on. The probability of the Lagrange multipliers is computed using a beta distribution (LaMotte 1994). Nonactive (nonbinding) bounds have no effect on the estimation results and are not noted in the output.
936 F Chapter 17: The MDC Procedure
The following BOUNDS statement constrains the estimates of the coefficient of ttime to be negative and the coefficients of x1 through x10 to be between zero and one. This example illustrates the use of parameter lists to specify boundary constraints. bounds ttime < 0, 0 < x1-x10 < 1;
BY Statement BY variables ;
A BY statement can be used with PROC MDC to obtain separate analyses on observations in groups defined by the BY variables.
CLASS Statement CLASS variables ;
The CLASS statement names the classification variables to be used in the analysis. Classification variables can be either character or numeric.
ID Statement ID variable ;
The ID statement must be used with PROC MDC to specify the identification variable that controls multiple choice-specific cases. The MDC procedure requires only one ID statement even with multiple MODEL statements.
MODEL Statement MODEL dependent = regressors < / options > ;
The MODEL statement specifies the dependent variable and independent regressor variables for the regression model. When the nested logit model is estimated, regressors in the UTILITY statement are used for estimation. The following options can be used in the MODEL statement after a slash (/).
MODEL Statement F 937
CHOICE=( variables ) CHOICE=( variable numbers )
specifies the variables that contain possible choices for each individual. Choice variables must have integer values. Multiple choice variables are allowed only for nested logit models. If all possible alternatives are written with the variable name, the MDC procedure checks all values of the choice variable. The CHOICE=(X 1 2 3) specification implies that the value of X should be 1, 2, or 3. On the other hand, the CHOICE=(X) specification considers all distinctive nonmissing values of X as elements of the choice set. CONVERGE=number
specifies the convergence criterion. The CONVERGE= option is the same as the ABSGCONV= option in the NLOPTIONS statement. The ABSGCONV= option in the NLOPTIONS statement overrides the CONVERGE= option. The default value is 1E–5. HALTONSTART=number
specifies the starting point of the Halton sequence. The specified number must be a positive integer. The default is HALTONSTART=11. HEV=( option-list )
specifies options that are used to estimate the HEV model. The HEV model with a unit scale for the alternative 1 is estimated using the following SAS statement: model y = x1 x2 x3 / hev=(unitscale=1);
The following options can be used in the HEV= option. These options are listed within parentheses and separated by commas. INTORDER=number specifies the number of summation terms for Gaussian quadrature integration. The default is INTORDER=40. The maximum order is limited to 45. This option applies only to the INTEGRATION=LAGUERRE method. UNITSCALE=number-list specifies restrictions on scale parameters of stochastic utility components. INTEGRATE=LAGUERRE | HARDY specifies the integration method. The INTEGRATE=HARDY option specifies an adaptive integration method, while the INTEGRATE=LAGUERRE option specifies the Gauss-Laguerre approximation method. The default is INTEGRATE=LAGUERRE. MIXED=( option-list )
specifies options that are used for mixed logit estimation. The mixed logit model with normally distributed random parameters is specified as follows: model y = x1 x2 x3 / mixed=(normalparm=x1);
The following options can be used in the MIXED= option. The options are listed within parentheses and separated by commas.
938 F Chapter 17: The MDC Procedure
LOGNORMALPARM=variables specifies the variables whose random coefficients are lognormally distributed. LOGNORMALPARM= variables must be included on the right-hand side of the MODEL statement. NORMALEC=variables specifies the error component variables whose coefficients have a normal distribution N.0; 2 /. NORMALPARM=variables specifies the variables whose random coefficients are normally distributed. NORMALPARM= variables must be included on the right-hand side of the MODEL statement. UNIFORMEC=variables specifies p the p error component variables whose coefficients have a uniform distribution U. 3; 3 /. UNIFORMPARM=variables specifies the variables whose random coefficients are uniformly distributed. UNIFORMPARM= variables must be included on the right-hand side of the MODEL statement. NCHOICE=number
specifies the number of choices for multinomial choice models when all individuals have the same choice set. When individuals have different number of choices, the NCHOICE= option is not allowed, and the CHOICE= option should be used. The NCHOICE= and CHOICE= options must not be used simultaneously, and the NCHOICE= option cannot be used for nested logit models. NSIMUL=number
specifies the number of simulations when the mixed logit or multinomial probit model is estimated. The default is NSIMUL=100. In general, you need a smaller number of simulations with RANDNUM=HALTON than with RANDNUM=PSEUDO. RANDNUM=value
specifies the type of the random number generator used for simulation. RANDNUM=HALTON is the default. The following option values are allowed: PSEUDO
specifies pseudo-random number generation.
HALTON
specifies Halton sequence generation.
RANDINIT RANDINIT=number
specifies that initial parameter values be perturbed by uniform pseudo-random numbers for numerical optimization of the objective function. The default is U. 1; 1/. When the RANDINIT=r option is specified, U. r; r/ pseudo-random numbers are generated. The value r should be positive. With a RANDINIT or RANDINIT= option, there are pure random searches for a given number of trials (1,000 for conditional or nested logit, and 500 for other models) to get a maximum (or minimum) value of the objective function. For example, when there is a parameter estimate with an initial value of 1, the RANDINIT option adds a generated random number u to the initial value and computes an objective function value by using 1 C u.
MODEL Statement F 939
This option is helpful in finding the initial value automatically if there is no guidance in setting the initial estimate. RANK
specifies that the dependent variable contain ranks. The numbers must be positive integers starting from 1. When the dependent variable has value 1, the corresponding alternative is chosen. This option is provided only as a convenience to the user; the extra information contained in the ranks is not currently used for estimation purposes. RESTART=( option-list )
specifies options that are used for reiteration of the optimization problem. When the ADDRANDOM option is specified, the initial value of reiteration is computed using random grid searches around the initial solution, as follows: model y = x1 x2 / type=clogit restart=(addvalue=(.01 .01));
The preceding SAS statement reestimates a conditional logit model by adding ADDVALUE= values. If the ADDVALUE= option contains missing values, the RESTART= option uses the corresponding estimate from the initial stage. If no ADDVALUE= value is specified for an estimate, a default value equal to (|estimate| * 1e-3) is added to the corresponding estimate from the initial stage. If both the ADDVALUE= and ADDRANDOM(=) options are specified, ADDVALUE= is ignored. The following options can be used in the RESTART= option. The options are listed within parentheses. ADDMAXIT=number specifies the maximum number of iterations for the second stage of the estimation. The default is ADDMAXIT=100. ADDRANDOM ADDRANDOM=value specifies random added values to the estimates from the initial stage. With the ADDRANDOM option, U. 1; 1/ random numbers are created and added to the estimates obtained in the initial stage. When the ADDRANDOM=r option is specified, U. r; r/ random numbers are generated. The restart initial value is determined based on the given number of random searches (1,000 for conditional or nested logit, and 500 for other models). ADDVALUE=( value-list ) specifies values added to the estimates from the initial stage. A missing value in the list is considered as a zero value for the corresponding estimate. When the ADDVALUE= option is not specified, default values equal to (|estimate| * 1e-3) are added. SAMESCALE
specifies that the parameters of the inclusive values be the same within a group at each level when the nested logit is estimated.
940 F Chapter 17: The MDC Procedure
SEED=number
specifies an initial seed for pseudo-random number generation. The SEED= value must be less than 231 1. If the SEED= value is negative or zero, the time of day from the computer’s clock is used to obtain the initial seed. The default is SEED=0. SPSCALE
specifies that the parameters of the inclusive values be the same for any choice with only one nested choice within a group, for each level in a nested logit model. This option is useful in analyzing stated preference data. TYPE=value
specifies the type of model to be analyzed. The following model types are supported: CONDITIONLOGIT | CLOGIT | CL
specifies a conditional logit model.
HEV
specifies a heteroscedastic extreme-value model.
MIXEDLOGIT | MXL
specifies a mixed logit model.
MULTINOMPROBIT | MPROBIT | MP
specifies a multinomial probit model.
NESTEDLOGIT | NLOGIT | NL
specifies a nested logit model.
UNITVARIANCE=( number-list )
specifies normalization restrictions on error variances of multinomial probit for the choices whose numbers are given in the list. If the UNITVARIANCE= option is specified, it must include at least two choices. Also, for identification, additional zero restrictions are placed on the correlation coefficients for the last choice in the list. COVEST=value
specifies the type of covariance matrix. The following types are supported: OP
specifies the covariance from the outer product matrix.
HESSIAN
specifies the covariance from the Hessian matrix.
QML
specifies the covariance from the outer product and Hessian matrices.
When COVEST=OP is specified, the outer product matrix is used to compute the covariance matrix of the parameter estimates. The COVEST=HESSIAN option produces the covariance matrix by using the inverse Hessian matrix. The quasi-maximum likelihood estimates are computed with COVEST=QML. The default is COVEST=HESSIAN when the NewtonRaphson method is used. COVEST=OP is the default when the OPTMETHOD=QN option is specified.
Printing Options ALL
requests all printing options. COVB
displays the estimated covariances of the parameter estimates.
MODEL Statement F 941
CORRB
displays the estimated correlation matrix of the parameter estimates. ITPRINT
displays the initial parameter estimates, convergence criteria, and constraints of the optimization. At each iteration, the objective function value, the maximum absolute gradient element, the step size, and the slope of search direction are printed. The objective function is the full negative log-likelihood function for the maximum likelihood method. When the ITPRINT option is specified and the NLOPTIONS statement is specified, all printing options in the NLOPTIONS statement are ignored. NOPRINT
suppresses all displayed output.
Estimation Control Options You can also specify detailed optimization options in the NLOPTIONS statement. The OPTMETHOD= option overrides the TECHNIQUE= option in the NLOPTIONS statement. The NLOPTIONS statement is ignored if the OPTMETHOD= option is specified. INITIAL=( initial-values ) START=( initial-values )
specifies initial values for some or all of the parameter estimates. The values specified are assigned to model parameters in the same order in which the parameter estimates are displayed in the MDC procedure output. When you use the INITIAL= option, the initial values in the INITIAL= option must satisfy the restrictions specified for the parameter estimates. If they do not, the initial values you specify are adjusted to satisfy the restrictions. MAXITER=number
sets the maximum number of iterations allowed. The MAXITER= option overrides the MAXITER= option in the NLOPTIONS statement. The default is MAXITER=100. OPTMETHOD=value
specifies the optimization technique when the estimation method uses nonlinear optimization. The following techniques are supported: QN
specifies the quasi-Newton method.
NR
specifies the Newton-Raphson method.
TR
specifies the trust region method.
The OPTMETHOD=NR option is the same as the TECHNIQUE=NEWRAP option in the NLOPTIONS statement. For the conditional and nested logit models, the default is OPTMETHOD=NR. For other models, the default is OPTMETHOD=QN.
942 F Chapter 17: The MDC Procedure
NEST Statement NEST LEVEL ( level-number )= ( choices@choice, . . . ) ;
The NEST statement is used when one choice variable contains all possible alternatives and the TYPE=NLOGIT option is specified. The decision tree is constructed based on the NEST statement. When the choice set is specified using multiple CHOICE= variables in the MODEL statement, the NEST statement is ignored. Consider the following eight choices that are nested in a three-level tree structure. Level 1 1 2 3 4 5 6 7 8
Level 2 1 1 1 2 2 2 3 3
Level 3 1 1 1 1 1 1 2 2
top 1 1 1 1 1 1 1 1
You can use the following NEST statement to specify the tree structure displayed in Figure 17.22: nest level(1) = (1 2 3 @ 1, 4 5 6 @ 2, 7 8 @ 3), level(2) = (1 2 @ 1, 3 @ 2), level(3) = (1 2 @ 1);
Figure 17.22 A Three-Level Tree
Note that the decision tree is constructed based on the sequence of first-level choice set specification.
NEST Statement F 943
Therefore, specifying another order at Level 1 builds a different tree. The following NEST statement builds the tree displayed in Figure 17.23: nest level(1) = (4 5 6 @ 2, 1 2 3 @ 1, 7 8 @ 3), level(2) = (1 2 @ 1, 3 @ 2), level(3) = (1 2 @ 1);
Figure 17.23 An Alternative Three-Level Tree
However, the NEST statement with a different sequence of choice specification at higher levels builds the same tree as displayed in Figure 17.22 if the sequence at the first level is the same: nest level(1) = (1 2 3 @ 1, 4 5 6 @ 2, 7 8 @ 3), level(2) = (3 @ 2, 1 2 @ 1), level(3) = (1 2 @ 1);
The following specifications are equivalent: nest level(2) = (3 @ 2, 1 2 @ 1) nest level(2) = (3 @ 2, 1 @ 1, 2 @ 1) nest level(2) = (1 @ 1, 2 @ 1, 3 @ 2)
Since the MDC procedure contains multiple cases for each individual, it is important to keep the data sequence in the proper order. Consider the four-choice multinomial model with one explanatory variable cost:
944 F Chapter 17: The MDC Procedure
pid 1 1 1 1 2 2 2 2
choice 1 2 3 4 1 2 3 4
y 1 0 0 0 0 0 1 0
cost 10 25 20 30 15 22 16 25
The order of data needs to correspond to the value of choice. Therefore, the following data set is equivalent to the preceding data: pid 1 1 1 1 2 2 2 2
choice 2 3 1 4 3 4 1 2
y 0 0 1 0 1 0 0 0
cost 25 20 10 30 16 25 15 22
The two-level nested model is estimated with a NEST statement, as follows: proc mdc data=one type=nlogit; model y = cost / choice=(choice); id pid; utility(1,) = cost; nest level(1) = (1 2 3 @ 1, 4 @ 2), level(2) = (1 2 @ 1); run;
The tree is constructed as in Figure 17.24. Figure 17.24 A Two-Level Tree
NLOPTIONS Statement F 945
Another model is estimated if you specify the decision tree as in Figure 17.25. The different nested tree structure is specified in the following SAS statements: proc mdc data=one type=nlogit; model y = cost / choice=(choice); id pid; utility u(1,) = cost; nest level(1) = (1 @ 1, 2 3 4 @ 2), level(2) = (1 2 @ 1); run;
Figure 17.25 An Alternate Two-Level Tree
NLOPTIONS Statement NLOPTIONS options ;
PROC MDC uses the nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks. The NLOPTIONS statement specifies nonlinear optimization options. The NLOPTIONS statement must follow the MODEL statement. For a list of all the options of the NLOPTIONS statement, see Chapter 6, “Nonlinear Optimization Methods.”
OUTPUT Statement OUTPUT options ;
The OUTPUT statement creates a new SAS data set that contains all the variables in the input data set and, optionally, the estimated linear predictors (XBETA) and predicted probabilities (P). The input data set must be sorted by the choice variables within each ID. OUT=SAS-data-set
specifies the name of the output data set.
946 F Chapter 17: The MDC Procedure
PRED=variable name P=variable name
requests the predicted probabilities by naming the variable that contains the predicted probabilities in the output data set. XBETA=variable name
names the variable that contains the linear predictor (x0 ˇ) values. However, the XBETA= option is not supported in the nested logit model.
RESTRICT Statement RESTRICT restriction1 < , restriction2 . . . > ;
The RESTRICT statement imposes linear restrictions on the parameter estimates. You can specify any number of RESTRICT statements. Each restriction is written as an expression, followed by an equality operator (=) or an inequality operator (, =), followed by a second expression: expression operator expression ;
The operator can be =, , =. Restriction expressions can be composed of parameters; multiplication (), summation (C), and substraction ( ) operators; and constants. Parameters named in restriction expressions must be among the parameters estimated by the model. Parameters associated with a regressor variable are referred to by the name of the corresponding regressor variable. The restriction expressions must be a linear function of the parameters. Lagrange multipliers are reported for all the active linear constraints. In the displayed output, the Lagrange multiplier estimates are identified with the names Restrict1, Restrict2, and so on. The probability of the Lagrange multipliers is computed using a beta distribution (LaMotte 1994). The following are examples of using the RESTRICT statement: proc mdc data=one; model y = x1-x10 / type=clogit choice=(mode 1 2 3); id pid; restrict x1*2 |t| 0.0083 0.0252 0.5014 0.0255
Consider the choice probability of the multinomial probit model: Pi .j / D P Œi1
ij < .xij
xi1 /0 ˇ; : : : ; iJ
ij < .xij
xiJ /0 ˇ
The probabilities of choice of the two alternatives can be written as Pi .1/ D P Œi2
i1 < .xi1
xi2 /0 ˇ
Pi .2/ D P Œi1 i2 < .xi2 xi1 /0 ˇ 2 i1 1 12 where N 0; . Assume that xi1 D 0 and 12 D 0. The binary probit 12 22 i2 model is estimated and displayed in Output 17.1.3. You do not get the same estimates as that of the usual binary probit model. The probabilities of choice in the binary probit model are Pi .2/ D P Œi < x0i ˇ
968 F Chapter 17: The MDC Procedure
Pi .1/ D 1
P Œi < x0i ˇ
where i N.0; 1/. However, the multinomial probit model has the error variance Var.i 2 i1 / D 12 C 22 if i1 and i2 are independent (12 D 0). In the following statements, unit variance restrictions are imposed on choices 1 and 2 (12 D 22 D 1). Therefore, the usual binary probit estimates (and standard errors) can be obtained by multiplying the multinomial probit estimates (and p standard errors) in Output 17.1.3 by 1= 2. /*-- Multinomial Probit --*/ proc mdc data=smdata1; model decision = choice2 gpa_2 tuce_2 psi_2 / type=mprobit nchoice=2 covest=hess unitvariance=(1 2); id id; run;
Output 17.1.3 Binary Probit Estimates The MDC Procedure Multinomial Probit Estimates Parameter Estimates
Parameter choice2 gpa_2 tuce_2 psi_2
DF
Estimate
Standard Error
t Value
1 1 1 1
-10.5392 2.2992 0.0732 2.0171
3.5956 0.9813 0.1186 0.8415
-2.93 2.34 0.62 2.40
Approx Pr > |t| 0.0034 0.0191 0.5375 0.0165
Example 17.2: Conditional Logit and Data Conversion In this example, data are prepared for use by the MDCDATA statement. Sometimes, choice-specific information is stored in multiple variables. Since the MDC procedure requires multiple observations for each decision maker, you need to arrange the data so that there is an observation for each subjectalternative (individual-choice) combination. Simple binary choice data are obtained from Ben-Akiva and Lerman (1985). The following statements create the SAS data set: data travel; length mode $ 8; input auto transit mode $; datalines; 52.9 4.4 Transit 4.1 28.5 Transit 4.1 86.9 Auto 56.2 31.6 Transit
Example 17.2: Conditional Logit and Data Conversion F 969
51.8 0.2 27.6 89.9 41.5 95.0
20.2 91.2 79.7 2.2 24.5 43.5
Transit Auto Auto Transit Transit Transit
... more lines ...
The travel time is stored in two variables, auto and transit. In addition, the chosen alternatives are stored in a character variable, mode. The choice variable, mode, is converted to a numeric variable, decision, since the MDC procedure supports only numeric variables. The following statements convert the original data set, travel, and estimate the binary logit model. The first 10 observations of a relevant subset of the new data set and the parameter estimates are displayed in Output 17.2.1 and Output 17.2.2, respectively. data new; set travel; retain id 0; id+1; /*-- create auto variable --*/ decision = (upcase(mode) = 'AUTO'); ttime = auto; autodum = 1; trandum = 0; output; /*-- create transit variable --*/ decision = (upcase(mode) = 'TRANSIT'); ttime = transit; autodum = 0; trandum = 1; output; run; proc print data=new(obs=10); var decision autodum trandum ttime; id id; run;
Output 17.2.1 Converted Data id
decision
autodum
trandum
ttime
1 1 2 2 3 3 4 4 5 5
0 1 0 1 1 0 0 1 0 1
1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1
52.9 4.4 4.1 28.5 4.1 86.9 56.2 31.6 51.8 20.2
970 F Chapter 17: The MDC Procedure
The following statements perform the binary logit estimation: proc mdc data=new; model decision = autodum ttime / type=clogit nchoice=2; id id; run;
Output 17.2.2 Binary Logit Estimation of Modal Choice Data The MDC Procedure Conditional Logit Estimates Parameter Estimates
Parameter autodum ttime
DF
Estimate
Standard Error
t Value
1 1
-0.2376 -0.0531
0.7505 0.0206
-0.32 -2.57
Approx Pr > |t| 0.7516 0.0101
In order to handle more general cases, you can use the MDCDATA statement. Choice-specific dummy variables are generated and multiple observations for each individual are created. The following example converts the original data set travel by using the MDCDATA statement and performs conditional logit analysis. Interleaved data are output into the new data set new3. This data set has twice as many observations as the original travel data set. proc mdc data=travel; mdcdata varlist( x1 = (auto transit) ) select=mode id=id alt=alternative decvar=Decision / out=new3; model decision = auto x1 / nchoice=2 type=clogit; id id; run;
The first nine observations of the modified data set are shown in Output 17.2.3. The result of the preceding program is listed in Output 17.2.4.
Example 17.3: Correlated Choice Modeling F 971
Output 17.2.3 Transformed Model Choice Data Obs 1 2 3 4 5 6 7 8 9
MODE TRANSIT TRANSIT TRANSIT TRANSIT AUTO AUTO TRANSIT TRANSIT TRANSIT
AUTO
TRANSIT
X1
ID
ALTERNATIVE
DECISION
1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0
52.9 4.4 4.1 28.5 4.1 86.9 56.2 31.6 51.8
1 1 2 2 3 3 4 4 5
1 2 1 2 1 2 1 2 1
0 1 0 1 1 0 0 1 0
Output 17.2.4 Results Using MDCDATA Statement The MDC Procedure Conditional Logit Estimates Parameter Estimates
Parameter AUTO X1
DF
Estimate
Standard Error
t Value
1 1
-0.2376 -0.0531
0.7505 0.0206
-0.32 -2.57
Approx Pr > |t| 0.7516 0.0101
Example 17.3: Correlated Choice Modeling Often, it is not realistic to assume that the random components of utility for all choices are independent. This example shows the solution to the problem of correlated random components by using multinomial probit and nested logit. To analyze correlated data, trinomial choice data (1,000 observations) are created using a pseudorandom number generator by using the following statements. The random utility function is Uij D Vij C ij ; j D 1; 2; 3 where 0 2
31 2 :6 0 ij N @0; 4 :6 1 0 5A 0 0 1 /*-- generate simulated series --*/ %let ndim = 3; %let nobs = 1000;
972 F Chapter 17: The MDC Procedure
data trichoice; array error{&ndim} e1-e3; array vtemp{&ndim} _temporary_; array lm{6} _temporary_ (1.4142136 0.4242641 0.9055385 0 0 1); retain nseed 345678; do id = 1 to &nobs; index = 0; /* generate independent normal variate */ do i = 1 to &ndim; /* index of diagonal element */ vtemp{i} = rannor(nseed); end; /* get multivariate normal variate */ index = 0; do i = 1 to &ndim; error{i} = 0; do j = 1 to i; error{i} = error{i} + lm{index+j}*vtemp{j}; end; index = index + i; end; x1 = 1.0 + 2.0 * ranuni(nseed); x2 = 1.2 + 2.0 * ranuni(nseed); x3 = 1.5 + 1.2 * ranuni(nseed); util1 = 2.0 * x1 + e1; util2 = 2.0 * x2 + e2; util3 = 2.0 * x3 + e3; do i = 1 to &ndim; vtemp{i} = 0; end; if ( util1 > util2 & util1 > util3 ) then vtemp{1} = 1; else if ( util2 > util1 & util2 > util3 ) then vtemp{2} = 1; else if ( util3 > util1 & util3 > util2 ) then vtemp{3} = 1; else continue; /*-- first choice --*/ x = x1; mode = 1; decision = vtemp{1}; output; /*-- second choice --*/ x = x2; mode = 2; decision = vtemp{2}; output; /*-- third choice --*/ x = x3; mode = 3; decision = vtemp{3}; output; end;
Example 17.3: Correlated Choice Modeling F 973
run;
First, the multinomial probit model is estimated (see the following statements). Results show that the standard deviation, correlation, and slope p estimates are close p to the parameter values. Note that 0:6 12 D q 212 2 D p.2/.1/ D 0:42, 1 D 2 D 1:41, 2 D 1 D 1, and the parameter value for .1 /.2 /
the variable x is 2.0. (See Output 17.3.1.) /*-- Trinomial Probit --*/ proc mdc data=trichoice randnum=halton nsimul=100; model decision = x / type=mprobit choice=(mode 1 2 3) covest=op optmethod=qn; id id; run;
Output 17.3.1 Trinomial Probit Model Estimation The MDC Procedure Multinomial Probit Estimates Parameter Estimates
Parameter x STD_1 RHO_21
DF
Estimate
Standard Error
t Value
1 1 1
1.7987 1.2824 0.4233
0.1202 0.1468 0.1041
14.97 8.74 4.06
Figure 17.29 shows a two-level decision tree. Figure 17.29
Nested Tree Structure
Approx Pr > |t| . . . ; ERRORMODEL equation-name distribution < CDF=( CDF(options) ) > ; ESTIMATE item1 < , item2 . . . > < ,/ options > ; EXOGENOUS variable < initial values > . . . ; FIT equations < PARMS=( parameter values . . . ) > < START=( parameter values . . . ) > < DROP=( parameters ) > < / options > ; FORMAT variable-list < format > < DEFAULT= default-format > ; GOTO statement-label ; ID variable-list ; IF expression ; IF expression THEN programming-statement1 ; < ELSE programming-statement2 > ; variable = expression ; variable + expression ; INCLUDE model-file . . . ; INSTRUMENTS < instruments > < _EXOG_ > < EXCLUDE=( parameters ) > < / options > ; KEEP variable . . . ; LABEL variable =’label’ . . . ; LENGTH variable-list < $ > length . . . < DEFAULT=length > ; LINK statement-label ; MOMENT variable-list = moment-specification . . . ; OUTVARS variable . . . ; PARAMETERS variable1 < value1 > < variable2 < value2 . . . > > ; PUT print-item . . . < @ > < @@ > ; RANGE variable < = first > < TO last > ; RENAME old-name1 = new-name1 < . . . old-name2 = new-name2 > ; RESET options ; RESTRICT restriction1 < , restriction2 . . . > ; RETAIN variable-list1 value1 < variable-list2 value2 . . . > ; RETURN ; SOLVE variable-list < SATISFY=(equations) > < / options > ; SUBSTR ( variable, index, length ) = expression ; SELECT < ( expression ) > ; OTHERWISE programming-statement ; STOP ; TEST < "name" > test1 < , test2 . . . > < ,/ options > ; VAR variable < initial-values > . . . ; WEIGHT variable ; WHEN ( expression ) programming-statement ;
1014 F Chapter 18: The MODEL Procedure
Functional Summary The statements and options in the MODEL procedure are summarized in the following table.
Description
Data Set Options specify the input data set for the variables specify the input data set for parameters specify the method for handling missing values specify the input data set for parameters request that the procedure produce graphics via the Output Delivery System specify the output data set for residual, predicted, or actual values specify the output data set for solution mode results write the actual values to OUT= data set select all output options write the covariance matrix of the estimates write the parameter estimates to a data set write the parameter estimates to a data set write the observations used to start the lags write the predicted values to the OUT= data set write the residual values to the OUT= data set write the covariance matrix of the equation errors to a data set write the S matrix used in the objective function definition to a data set write the estimate of the variance matrix of the moment generating function read the covariance matrix of the equation errors read the covariance matrix for GMM and ITGMM specify the name of the time variable select the estimation type to read General ESTIMATE Statement Options specify the name of the data set in which the estimate of the functions of the parameters are to be written write the covariance matrix of the functions of the parameters to the OUTEST= data set
Statement
Option
FIT, SOLVE FIT, SOLVE FIT MODEL MODEL
DATA= ESTDATA= MISSING= PARMSDATA= PLOTS=
FIT
OUT=
SOLVE
OUT=
FIT FIT FIT FIT MODEL SOLVE FIT FIT FIT
OUTACTUAL OUTALL OUTCOV OUTEST= OUTPARMS= OUTLAGS OUTPREDICT OUTRESID OUTS=
FIT
OUTSUSED=
FIT
OUTV=
FIT, SOLVE
SDATA=
FIT
VDATA=
FIT, SOLVE, MODEL FIT, SOLVE
TIME= TYPE=
ESTIMATE
OUTEST=
ESTIMATE
OUTCOV
Functional Summary F 1015
Description
Statement
Option
print the covariance matrix of the functions of the parameters print the correlation matrix of the functions of the parameters
ESTIMATE
COVB
ESTIMATE
CORRB
FIT
BREUSCH
FIT FIT FIT FIT FIT FIT FIT FIT FIT FIT FIT
CHOW= COLLIN CORR CORRB CORRS COV COVB COVS DW FSRSQ GODFREY
FIT FIT FIT
HAUSMAN NORMAL PCHOW=
FIT FIT
PRINTALL WHITE
FIT
I
FIT FIT FIT FIT
ITPRINT ITDETAILS XPX ITALL
FIT FIT
CONVERGE= HESSIAN=
FIT, SOLVE, MODEL
LTEBOUND=
FIT
MAXITER=
Printing Options for FIT Tasks print the modified Breusch-Pagan test for heteroscedasticity print the Chow test for structural breaks print collinearity diagnostics print the correlation matrices print the correlation matrix of the parameters print the correlation matrix of the residuals print the covariance matrices print the covariance matrix of the parameters print the covariance matrix of the residuals print Durbin-Watson d statistics print first-stage R2 statistics print Godfrey’s tests for autocorrelated residuals for each equation print Hausman’s specification test print tests of normality of the model residuals print the predictive Chow test for structural breaks specify all the printing options print White’s test for heteroscedasticity Options to Control FIT Iteration Output print the inverse of the crossproducts Jacobian matrix print a summary iteration listing print a detailed iteration listing print the crossproduct Jacobian matrix specify all the iteration printing-control options Options to Control the Minimization Process specify the convergence criteria select the Hessian approximation used for FIML specify the local truncation error bound for the integration specify the maximum number of iterations allowed
1016 F Chapter 18: The MODEL Procedure
Description
Statement
Option
specify the maximum number of subiterations allowed select the iterative minimization method to use specify the smallest allowed time step to be used in the integration modify the iterations for estimation methods that iterate the S matrix or the V matrix specify the smallest pivot value specify the number of minimization iterations to perform at each grid point specify a weight variable
FIT
MAXSUBITER=
FIT FIT, SOLVE, MODEL
METHOD= MINTIMESTEP=
FIT
NESTIT
MODEL, FIT, SOLVE FIT
SINGULAR STARTITER=
Options to Read and Write Model Files read a model from one or more input model files suppress the default output of the model file specify the name of an output model file delete the current model Options to List or Analyze the Structure of the Model print a dependency structure of a model print a graph of the dependency structure of a model print the model program and variable lists print the derivative tables and compiled model program code print a dependency list print a table of derivatives print a cross-reference of the variables General Printing Control Options expand parts of the printed output print a message for each statement as it is executed select the maximum number of execution errors that can be printed select the number of decimal places shown in the printed output suppress the normal printed output specify all the noniteration printing options print the result of each operation as it is executed
WEIGHT
INCLUDE
MODEL=
MODEL, RESET MODEL, RESET RESET
NOSTORE OUTMODEL= PURGE
MODEL MODEL
BLOCK GRAPH
MODEL MODEL
LIST LISTCODE
MODEL MODEL MODEL
LISTDEP LISTDER XREF
FIT, SOLVE FIT, SOLVE
DETAILS FLOW
FIT, SOLVE
MAXERRORS=
FIT, SOLVE
NDEC=
FIT, SOLVE FIT, SOLVE FIT, SOLVE
NOPRINT PRINTALL TRACE
Functional Summary F 1017
Description
Statement
Option
request a comprehensive memory usage sum- FIT, SOLVE, MODEL, MEMORYUSE mary RESET turn off the NOPRINT option RESET PRINT Statements that Declare Variables associate a name with a list of variables and constants declare a variable to have a fixed value declare a variable to be a dependent or endogenous variable declare a variable to be an independent or exogenous variable specify identifying variables assign a label to a variable select additional variables to be output declare a variable to be a parameter force a variable to hold its value from a previous observation declare a model variable declare an instrumental variable omit the default intercept term in the instruments list General FIT Statement Options omit parameters from the estimation associate a variable with an initial value as a parameter or a constant bypass OLS to get initial parameter estimates for GMM, ITGMM, or FIML bypass 2SLS to get initial parameter estimates for GMM, ITGMM, or FIML specify the parameters to estimate request confidence intervals on estimated parameters select a grid search Options to Control the Estimation Method Used specify nonlinear ordinary least squares specify iterated nonlinear ordinary least squares specify seemingly unrelated regression specify iterated seemingly unrelated regression specify two-stage least squares specify iterated two-stage least squares
ARRAY CONTROL ENDOGENOUS EXOGENOUS ID LABEL OUTVARS PARAMETERS RETAIN VAR INSTRUMENTS INSTRUMENTS
NOINT
FIT FIT
DROP= INITIAL=
FIT
NOOLS
FIT
NO2SLS
FIT FIT
PARMS= PRL=
FIT
START=
FIT FIT FIT FIT FIT FIT
OLS ITOLS SUR ITSUR 2SLS IT2SLS
1018 F Chapter 18: The MODEL Procedure
Description
Statement
Option
specify three-stage least squares specify iterated three-stage least squares specify full information maximum likelihood specify simulated method of moments specify number of draws for the V matrix specify number of initial observations for SMM select the variance-covariance estimator used for FIML specify generalized method of moments specify the kernel for GMM and ITGMM specify iterated generalized method of moments specify the type of generalized inverse used for the covariance matrix specify the denominator for computing variances and covariances specify adding the variance adjustment for SMM specify variance correction for heteroscedasticity specify GMM variance under arbitrary weighting matrix specify GMM variance under optimal weighting matrix
FIT FIT FIT FIT FIT FIT FIT
3SLS IT3SLS FIML NDRAW NDRAWV NPREOBS COVBEST=
FIT FIT FIT
GMM KERNEL= ITGMM
FIT
GINV=
FIT
VARDEF=
FIT
ADJSMMV
FIT
HCCME=
FIT
GENGMMV
FIT
NOGENGMMV
SOLVE SOLVE SOLVE
SATISFY= FORECAST SIMULATE
SOLVE SOLVE SOLVE
DYNAMIC STATIC NAHEAD=
SOLVE
START=
Solution Mode Options select a subset of the model equations solve only for missing variables solve for all solution variables Solution Mode Options: Lag Processing use solved values in the lag functions use actual values in the lag functions produce successive forecasts to a fixed forecast horizon select the observation to start dynamic solutions
Solution Mode Options: Numerical Methods specify the maximum number of iterations al- SOLVE lowed specify the maximum number of subiterations SOLVE allowed
MAXITER= MAXSUBITER=
Functional Summary F 1019
Description
Statement
Option
specify the convergence criteria compute a simultaneous solution using a Jacobi-like iteration compute a simultaneous solution using a GaussSeidel-like iteration compute a simultaneous solution using Newton’s method compute a nonsimultaneous solution
SOLVE SOLVE
CONVERGE= JACOBI
SOLVE
SEIDEL
SOLVE
NEWTON
SOLVE
SINGLE
Monte Carlo Simulation Options specify quasi-random number generator specify pseudo-random number generator repeat the solution multiple times initialize the pseudo-random number generator specify copula options
SOLVE SOLVE SOLVE SOLVE SOLVE
QUASI= PSUEDO= RANDOM= SEED= COPULA=
FIT, SOLVE, MODEL
INTGPRINT
SOLVE
ITPRINT
SOLVE
SOLVEPRINT
SOLVE SOLVE SOLVE
STATS THEIL PRINTALL
TEST TEST
WALD LM
TEST TEST TEST
LR ALL OUT=
Solution Mode Printing Options print between data points integration values for the DERT. variables and the auxiliary variables print the solution approximation and equation errors print the solution values and residuals at each observation print various summary statistics print tables of Theil inequality coefficients specify all printing control options General TEST Statement Options specify that a Wald test be computed specify that a Lagrange multiplier test be computed specify that a likelihood ratio test be computed request all three types of tests specify the name of an output SAS data set that contains the test results Miscellaneous Statements specify the range of observations to be used subset the data set with BY variables
RANGE BY
1020 F Chapter 18: The MODEL Procedure
PROC MODEL Statement PROC MODEL options ;
The following options can be specified in the PROC MODEL statement. All of the nonassignment options (the options that do not accept a value after an equal sign) can have NO prefixed to the option name in the RESET statement to turn the option off. The default case is not explicitly indicated in the discussion that follows. Thus, for example, the option DETAILS is documented in the following, but NODETAILS is not documented since it is the default. Also, the NOSTORE option is documented because STORE is the default.
Data Set Options DATA=SAS-data-set
names the input data set. Variables in the model program are looked up in the DATA= data set and, if found, their attributes (type, length, label, format) are set to be the same as those in the input data set (if not previously defined otherwise). The values for the variables in the program are read from the input data set when the model is estimated or simulated by FIT and SOLVE statements. OUTPARMS=SAS-data-set
writes the parameter estimates to a SAS data set. See the section “Output Data Sets” on page 1160 for details. PARMSDATA=SAS-data-set
names the SAS data set that contains the parameter estimates. In PROC MODEL, you have several options to specify starting values for the parameters to be estimated. When more than one option is specified, the options are implemented in the following order of precedence (from highest to lowest): the START= option, the PARMS statement initialization value, the ESTDATA= option, and the PARMSDATA= option. If no options are specified for the starting value, the default value of 0.0001 is used. See the section “Input Data Sets” on page 1154 for details. PLOTS=global-plot-options | plot-request
requests that the MODEL procedure produce statistical graphics via the Output Delivery System, provided that the ODS GRAPHICS statement has been specified. For general information about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide). The global-plot-options apply to all relevant plots generated by the MODEL procedure. The global-plot-options supported by the MODEL procedure follow.
Global Plot Options ONLY
suppresses the default plots. Only the plots specifically requested are produced.
UNPACKPANEL
breaks a graphic that is otherwise paneled into individual component plots.
PROC MODEL Statement F 1021
Specific Plot Options ALL
requests that all plots appropriate for the particular analysis be produced.
ACF
produces the autocorrelation function plot.
IACF
produces the inverse autocorrelation function plot of residuals.
PACF
produces the partial autocorrelation function plot of residuals.
FITPLOT
plots the predicted and actual values.
COOKSD
produces the Cook’s D plot.
QQ
produces a QQ plot of residuals.
RESIDUAL | RES
plots the residuals.
STUDENTRESIDUAL
plots the studentized residuals.
RESIDUALHISTOGRAM | RESIDHISTOGRAM NONE
plots the histogram of residuals.
suppresses all plots.
Options to Read and Write Model Files MODEL=model-name MODEL=(model-list)
reads the model from one or more input model files created by previous PROC MODEL executions. Model files are written by the OUTMODEL= option. NOSTORE
suppresses the default output of the model file. This option is applicable only when FIT or SOLVE statements are not used, the MODEL= option is not used, and when a model is specified. OUTMODEL=model-name
specifies the name of an output model file to which the model is to be written. Starting with SAS 9.2, model files are being stored as XML-based SAS data sets instead of being stored as members of a SAS catalog as in earlier releases. This makes MODEL files more readily extendable in the future and enables Java-based applications to read the MODEL files directly. To change this behavior, use the SAS global-CMPMODEL-options. You can choose the format in which the output model file is stored and read by using the CMPMODEL=globalCMPMODEL-options in an OPTIONS statement as follows. OPTIONS CMPMODEL=global-CMPMODEL-options;
The global CMPMODEL options are: CATALOG
specifies that model files be written and read from SAS catalogs only.
XML
specifies that model files be written and read from XML datasets only.
1022 F Chapter 18: The MODEL Procedure
BOTH
specifies that model files be written to both XML and CATALOG formats. When BOTH is specified, model files are read from the data set first and read from the SAS catalog only if the data set is not found. This is the default option.
Options to List or Analyze the Structure of the Model These options produce reports on the structure of the model or list the programming statements that define the models. These options are automatically reset (turned off) after the reports are printed. To turn these options back on after a RUN statement has been entered, use the RESET statement or specify the options on a FIT or SOLVE statement. BLOCK
prints an analysis of the structure of the model given by the assignments to model variables appearing in the model program. This analysis includes a classification of model variables into endogenous (dependent) and exogenous (independent) groups based on the presence of the variable on the left-hand side of an assignment statement. The endogenous variables are grouped into simultaneously determined blocks. The dependency structure of the simultaneous blocks and exogenous variables is also printed. The BLOCK option cannot analyze dependencies implied by general form equations. GRAPH
prints the graph of the dependency structure of the model. The GRAPH option also invokes the BLOCK option and produces a graphical display of the information listed by the BLOCK option. LIST
prints the model program and variable lists, including the statements added by PROC MODEL and macros. LISTALL
selects the LIST, LISTDEP, LISTDER, and LISTCODE options. LISTCODE
prints the derivative tables and compiled model program code. LISTCODE is a debugging feature and is not normally needed. LISTDEP
prints a report that lists for each variable in the model program the variables that depend on it and that it depends on. These lists are given separately for current-period values and for lagged values of the variables. The information displayed is the same as that used to construct the BLOCK report but differs in that the information is listed for all variables (including parameters, control variables, and program variables), not just for the model variables. Classification into endogenous and exogenous groups and analysis of simultaneous structure is not done by the LISTDEP report. LISTDER
prints a table of derivatives for FIT and SOLVE tasks. (The LISTDER option is applicable only
PROC MODEL Statement F 1023
for the default NEWTON method for SOLVE tasks.) The derivatives table shows each nonzero derivative computed for the problem. The derivative listed can be a constant, a variable in the model program, or a special derivative variable created to hold the result of the derivative expression. This option is turned on by the LISTCODE and PRINTALL options. XREF
prints a cross-reference of the variables in the model program that shows where each variable was referenced or given a value. The XREF option is normally used in conjunction with the LIST option. A more detailed description is given in the section “Diagnostics and Debugging” on page 1217.
General Printing Control Options DETAILS
specifies the detailed printout. Parts of the printed output are expanded when the DETAILS option is specified. If ODS GRAPHICS ON is specified, the following additional graphs of the residuals are produced: ACF, PACF, IACF, white noise, and QQ plot versus the normal. FLOW
prints a message for each statement in the model program as it is executed. This debugging option is needed very rarely and produces voluminous output. MAXERRORS=n
specifies the maximum number of execution errors that can be printed. The default is MAXERRORS=50. NDEC=n
specifies the precision of the format that PROC MODEL uses when printing various numbers. The default is NDEC=3, which means that PROC MODEL attempts to print values by using the D format but ensures that at least three significant digits are shown. If the NDEC= value is greater than nine, the BEST. format is used. The smallest value allowed is NDEC=2. The NDEC= option affects the format of most, but not all, of the floating point numbers that PROC MODEL can print. For some values (such as parameter estimates), a precision limit one or two digits greater than the NDEC= value is used. This option does not apply to the precision of the variables in the output data set. NOPRINT
suppresses the normal printed output but does not suppress error listings. Using any other print option turns the NOPRINT option off. The PRINT option can be used with the RESET statement to turn off NOPRINT. PRINTALL
turns on all the printing-control options. The options set by PRINTALL are DETAILS; the model information options LIST, LISTDEP, LISTDER, XREF, BLOCK, and GRAPH; the FIT task printing options FSRSQ, COVB, CORRB, COVS, CORRS, DW, and COLLIN; and the SOLVE task printing options STATS, THEIL, SOLVEPRINT, and ITPRINT.
1024 F Chapter 18: The MODEL Procedure
TRACE
prints the result of each operation in each statement in the model program as it is executed, in addition to the information printed by the FLOW option. This debugging option is needed very rarely and produces voluminous output. MEMORYUSE
prints a report of the memory required for the various parts of the analysis.
FIT Task Options The following options are used in the FIT statement (parameter estimation) and can also be used in the PROC MODEL statement: COLLIN, CONVERGE=, CORR, CORRB, CORRS, COVB, COVBEST=, COVS, DW, FIML, FSRSQ, GMM, HESSIAN=, I, INTGPRINT, ITALL, ITDETAILS, ITGMM, ITPRINT, ITOLS, ITSUR, IT2SLS, IT3SLS, KERNEL=, LTEBOUND=, MAXITER=, MAXSUBITER=, METHOD=, MINTIMESTEP=, NESTIT, N2SLS, N3SLS, OLS, OUTPREDICT, OUTRESID, OUTACTUAL, OUTLAGS, OUTALL, OUTCOV, SINGULAR=, STARTITER=, SUR, TIME=, VARDEF, and XPX. See the section “FIT Statement” on page 1033 for a description of these options. When used in the PROC MODEL or RESET statement, these are default options for subsequent FIT statements. For example, the statement proc model n2sls ... ;
makes two-stage least squares the default parameter estimation method for FIT statements that do not specify an estimation method.
SOLVE Task Options The following options used in the SOLVE statement can also be used in the PROC MODEL statement: CONVERGE=, DYNAMIC, FORECAST, INTGPRINT, ITPRINT, JACOBI, LTEBOUND=, MAXITER=, MAXSUBITER=, MINTIMESTEP=, NAHEAD=, NEWTON, OUTPREDICT, OUTRESID, OUTACTUAL, OUTLAGS, OUTERRORS, OUTALL, SEED=, SEIDEL, SIMULATE, SINGLE, SINGULAR=, SOLVEPRINT, START=, STATIC, STATS, THEIL, TIME=, and TYPE=. See the section “SOLVE Statement” on page 1050 for a description of these options. When used in the PROC MODEL or RESET statement, these options provide default values for subsequent SOLVE statements.
BOUNDS Statement BOUNDS bound1 < , bound2 . . . > ;
BOUNDS Statement F 1025
The BOUNDS statement imposes simple boundary constraints on the parameter estimates. BOUNDS statement constraints refer to the parameters estimated by the associated FIT statement (that is, to either the preceding FIT statement or, in the absence of a preceding FIT statement, to the following FIT statement). You can specify any number of BOUNDS statements. Each bound is composed of parameters and constants and inequality operators: item operator item < operator item < operator item . . . > >
Each item is a constant, the name of an estimated parameter, or a list of parameter names. Each operator is , =. You can use both the BOUNDS statement and the RESTRICT statement to impose boundary constraints; however, the BOUNDS statement provides a simpler syntax for specifying these kinds of constraints. See the section “RESTRICT Statement” on page 1049 for more information about the computational details of estimation with inequality restrictions. Lagrange multipliers are reported for all the active boundary constraints. In the printed output and in the OUTEST= data set, the Lagrange multiplier estimates are identified with the names BOUND0, BOUND1, and so forth. The probability of the Lagrange multipliers are computed using a beta distribution (LaMotte 1994). To give the constraints more descriptive names, use the RESTRICT statement instead of the BOUNDS statement. The following BOUNDS statement constrains the estimates of the parameters A and B and the ten parameters P1 through P10 to be between zero and one. This example illustrates the use of parameter lists to specify boundary constraints. bounds 0 < a b p1-p10 < 1;
The following statements are an example of the use of the BOUNDS statement and they produce the output shown in Figure 18.13: title 'Holzman Function (1969), Himmelblau No. 21, N=3'; data zero; do i = 1 to 99; output; end; run; proc model data=zero; parms x1= 100 x2= 12.5 x3= bounds .1 )
specifies the 2 distribution. This option is supported only for simulation. The arguments correspond to the arguments of the SAS CDF function (ignoring the random variable argument). GENERAL(Likelihood < , parm1, parm2, : : : parmn > )
specifies the negative of a general log-likelihood function that you construct by using SAS programming statements. The procedure minimizes the negative log-likelihood function specified. parm1; parm2; : : : parmn are optional parameters for this distribution and are used for documentation purposes only. F( ndf, ddf < , nc > )
specifies the F distribution. This option is supported only for simulation. The arguments correspond to the arguments of the SAS CDF function (ignoring the random variable argument). NORMAL( v1 v2 : : : vn )
specifies a multivariate normal (Gaussian) distribution with mean 0 and variances v1 through vn . POISSON( mean )
specifies the Poisson distribution. This option is supported only for simulation. The arguments correspond to the arguments of the SAS CDF function (ignoring the random variable argument). T( v1 v2 vn , df )
specifies a multivariate t distribution with noncentrality 0, variance v1 through vn , and common degrees of freedom df . UNIFORM( < left, right > )
specifies the uniform distribution. This option is supported only for simulation. The arguments correspond to the arguments of the SAS CDF function (ignoring the random variable argument).
ESTIMATE Statement F 1031
Options to Specify the CDF for Simulation CDF=( CDF(options) )
specifies the univariate distribution that is used for simulation so that the estimation can be done for one set of distributional assumptions and the simulation for another. The CDF can be any of the distributions from the previous section with the exception of the general likelihood. In addition, you can specify the empirical distribution of the residuals. EMPIRICAL= ( < TAILS=(options) > )
uses the sorted residual data to create an empirical CDF. TAILS=( tail-options )
specifies how to handle the tails in computing the inverse CDF from an empirical distribution, where tail-options are:
NORMAL
specifies the normal distribution to extrapolate the tails.
T( df )
specifies the t distribution to extrapolate the tails.
PERCENT= p
specifies the percentage of the observations to use in constructing each tail. The default for the PERCENT= option is 10. A normal distribution or a t distribution is used to extrapolate the tails to infinity. The variance for the tail distribution is obtained from the data so that the empirical CDF is continuous.
ESTIMATE Statement ESTIMATE item < , item . . . > < ,/ options > ;
The ESTIMATE statement computes estimates of functions of the parameters. The ESTIMATE statement refers to the parameters estimated by the associated FIT statement (that is, to either the preceding FIT statement or, in the absence of a preceding FIT statement, to the following FIT statement). You can use any number of ESTIMATE statements. Let h. / denote the function of parameters that needs to be estimated. Let O denote the unconstrained O be the estimate of the covariance matrix of . Denote estimate of the parameter of interest, . Let V A. / D @h. /=@ jO Then the standard error of the parameter function estimate is computed by obtaining the square O 0 .O /. This is the same as the variance needed for a Wald type test statistic with null root of A.O /VA hypothesis h. / D 0. If the expression of the function in the ESTIMATE statement includes a variable, then the value used in computing the function estimate is the last observation of the variable in the DATA= data set.
1032 F Chapter 18: The MODEL Procedure
If you specify options on the ESTIMATE statement, a comma is required before the “/” character that separates the test expressions from the options, since the “/” character can also be used within test expressions to indicate division. Each item is written as an optional name followed by an expression, < "name" > expression
where "name" is a string used to identify the estimate in the printed output and in the OUTEST= data set. Expressions can be composed of parameter names, arithmetic operators, functions, and constants. Comparison operators (such as = or |t|
Join point plateau
12.7504 0.777516
1.2785 0.0123
9.97 63.10
< PARMS=( parameter < values > . . . ) > < START=( parameter values . . . ) > < DROP=( parameter . . . ) > < INITIAL=( variable < = parameter | constant > . . . ) > < / options > ;
1034 F Chapter 18: The MODEL Procedure
The FIT statement estimates model parameters by fitting the model equations to input data and optionally selects the equations to be fit. If the list of equations is omitted, all model equations that contain parameters are fitted. The following options can be used in the FIT statement. DROP= ( parameters . . . )
specifies that the named parameters not be estimated. All the parameters in the equations fit are estimated except those listed in the DROP= option. The dropped parameters retain their previous values and are not changed by the estimation. INITIAL= ( variable = < parameter | constant > . . . )
associates a variable with an initial value as a parameter or a constant. This option applies only to ordinary differential equations. See the section “Ordinary Differential Equations” on page 1116 for more information. PARMS= ( parameters [values] . . . )
selects a subset of the parameters for estimation. When the PARMS= option is used, only the named parameters are estimated. Any parameters not specified in the PARMS= list retain their previous values and are not changed by the estimation. In PROC MODEL, you have several options to specify starting values for the parameters to be estimated. When more than one option is specified, the options are implemented in the following order of precedence (from highest to lowest): the START= option, the PARMS statement initialization value, the ESTDATA= option, and the PARMSDATA= option. If no options are specified for the starting value, the default value of 0.0001 is used. PRL= WALD | LR | BOTH
requests confidence intervals on estimated parameters. By default, the PRL option produces 95% likelihood ratio confidence limits. The coverage of the confidence interval is controlled by the ALPHA= option in the FIT statement. START= ( parameter values . . . )
supplies starting values for the parameter estimates. In PROC MODEL, you have several options to specify starting values for the parameters to be estimated. When more than one option is specified, the options are implemented in the following order of precedence (from highest to lowest): the START= option, the PARMS statement initialization value, the ESTDATA= option, and the PARMSDATA= option. If no options are specified for the starting value, the default value of 0.0001 is used. If the START= option specifies more than one starting value for one or more parameters, a grid search is performed over all combinations of the values, and the best combination is used to start the iterations. For more information, see the STARTITER= option.
Options to Control the Estimation Method Used ADJSMMV
specifies adding the variance adjustment from simulating the moments to the variancecovariance matrix of the parameter estimators. By default, no adjustment is made.
FIT Statement F 1035
COVBEST=GLS | CROSS | FDA
specifies the variance-covariance estimator used for FIML. COVBEST=GLS selects the generalized least squares estimator. COVBEST=CROSS selects the crossproducts estimator. COVBEST=FDA selects the inverse of the finite difference approximation to the Hessian. The default is COVBEST=CROSS. DYNAMIC
specifies dynamic estimation of ordinary differential equations. See the section “Ordinary Differential Equations” on page 1116 for more details. FIML
specifies full information maximum likelihood estimation. GINV=G2 | G4
specifies the type of generalized inverse to be used when computing the covariance matrix. G4 selects the Moore-Penrose generalized inverse. The default is GINV=G2. Rather than deleting linearly related rows and columns of the covariance matrix, the MoorePenrose generalized inverse averages the variance effects between collinear rows. When the option GINV=G4 is used, the Moore-Penrose generalized inverse is used to calculate standard errors and the covariance matrix of the parameters as well as the change vector for the optimization problem. For singular systems, a normal G2 inverse is used to determine the singular rows so that the parameters can be marked in the parameter estimates table. A G2 inverse is calculated by satisfying the first two properties of the Moore-Penrose generalized inverse; that is, AAC A D A and AC AAC D AC . Whether or not you use a G4 inverse, if the covariance matrix is singular, the parameter estimates are not unique. Refer to Noble and Daniel (1977, pp. 337–340) for more details about generalized inverses. GENGMMV
specify GMM variance under arbitrary weighting matrix. See the section “Estimation Methods” on page 1057 for more details. This is the default method for GMM estimation. GMM
specifies generalized method of moments estimation. HCCME= 0 | 1 | 2 | 3 | NO
specifies the type of heteroscedasticity-consistent covariance matrix estimator to use for OLS, 2SLS, 3SLS, SUR, and the iterated versions of these estimation methods. The number corresponds to the type of covariance matrix estimator to use as H C0 W Ot2 H C1 W n ndf Ot2 H C2 W Ot2 =.1 H C3 W Ot2 =.1 The default is NO.
hO t / hO t /2
1036 F Chapter 18: The MODEL Procedure
ITGMM
specifies iterated generalized method of moments estimation. ITOLS
specifies iterated ordinary least squares estimation. This is the same as OLS unless there are cross-equation parameter restrictions. ITSUR
specifies iterated seemingly unrelated regression estimation IT2SLS
specifies iterated two-stage least squares estimation. This is the same as 2SLS unless there are cross-equation parameter restrictions. IT3SLS
specifies iterated three-stage least squares estimation. KERNEL=(PARZEN | BART | QS, < c > , < e > ) KERNEL=PARZEN | BART | QS
specifies the kernel to be used for GMM and ITGMM. PARZEN selects the Parzen kernel, BART selects the Bartlett kernel, and QS selects the quadratic spectral kernel. e 0 and c 0 are used to compute the bandwidth parameter. The default is KERNEL=(PARZEN, 1, 0.2). See the section “Estimation Methods” on page 1057 for more details. N2SLS | 2SLS
specifies nonlinear two-stage least squares estimation. This is the default when an INSTRUMENTS statement is used. N3SLS | 3SLS
specifies nonlinear three-stage least squares estimation. NDRAW < =number of draws >
requests the simulation method for estimation. H is the number of draws. If number of draws is not specified, the default H is set to 10. NOOLS NO2SLS
specifies bypassing OLS or 2SLS to get initial parameter estimates for GMM, ITGMM, or FIML. This is important for certain models that are poorly defined in OLS or 2SLS, or if good initial parameter values are already provided. Note that for GMM, the V matrix is created by using the initial values specified and this might not be consistently estimated. NO3SLS
specifies not to use 3SLS automatically for FIML initial parameter starting values. NOGENGMMV
specifies not to use GMM variance under arbitrary weighting matrix. Use GMM variance under optimal weighting matrix instead. See the section “Estimation Methods” on page 1057 for more details.
FIT Statement F 1037
NPREOBS =number of obs to initialize
specifies the initial number of observations to run the simulation before the simulated values are compared to observed variables. This option is most useful in cases where the program statements involve lag operations. Use this option to avoid the effect of the starting point on the simulation. NVDRAW =number of draws for V matrix
specifies H 0 , the number of draws for V matrix. If this option is not specified, the default H 0 is set to 20. OLS
specifies ordinary least squares estimation. This is the default. SUR
specifies seemingly unrelated regression estimation. VARDEF=N | WGT | DF | WDF
specifies the denominator to be used in computing variances and covariances, MSE, root MSE measures, and so on. VARDEF=N specifies that the number of nonmissing observations be used. VARDEF=WGT specifies that the sum of the weights be used. VARDEF=DF specifies that the number of nonmissing observations minus the model degrees of freedom (number of parameters) be used. VARDEF=WDF specifies that the sum of the weights minus the model degrees of freedom be used. The default is VARDEF=DF. For FIML estimation the VARDEF= option does not affect the calculation of the parameter covariance matrix, which is determined by the COVBEST= option.
Data Set Options DATA=SAS-data-set
specifies the input data set. Values for the variables in the program are read from this data set. If the DATA= option is not specified on the FIT statement, the data set specified by the DATA= option on the PROC MODEL statement is used. ESTDATA=SAS-data-set
specifies a data set whose first observation provides initial values for some or all of the parameters. MISSING=PAIRWISE | DELETE
specifies how missing values are handled. MISSING=PAIRWISE specifies that missing values are tracked on an equation-by-equation basis. MISSING=DELETE specifies that the entire observation is omitted from the analysis when any equation has a missing predicted or actual value for the equation. The default is MISSING=DELETE. OUT=SAS-data-set
names the SAS data set to contain the residuals, predicted values, or actual values from each estimation. The residual values written to the OUT= data set are defined as the actual pred i ct ed , which is the negative of RESID.variable as defined in the section “Equation Translations” on page 1204. Only the residuals are output by default.
1038 F Chapter 18: The MODEL Procedure
OUTACTUAL
writes the actual values of the endogenous variables of the estimation to the OUT= data set. This option is applicable only if the OUT= option is specified. OUTALL
selects the OUTACTUAL, OUTERRORS, OUTLAGS, OUTPREDICT, and OUTRESID options. OUTCOV COVOUT
writes the covariance matrix of the estimates to the OUTEST= data set in addition to the parameter estimates. The OUTCOV option is applicable only if the OUTEST= option is also specified. OUTEST=SAS-data-set
names the SAS data set to contain the parameter estimates and optionally the covariance of the estimates. OUTLAGS
writes the observations used to start the lags to the OUT= data set. This option is applicable only if the OUT= option is specified. OUTPREDICT
writes the predicted values to the OUT= data set. This option is applicable only if OUT= is specified. OUTRESID
writes the residual values computed from the parameter estimates to the OUT= data set. The OUTRESID option is the default if neither OUTPREDICT nor OUTACTUAL is specified. This option is applicable only if the OUT= option is specified. If the h.var equation is specified, the residual values written to the OUT= data set are the normalized residuals, defined as act ual pred i ct ed , divided by the square root of the h.var value. If the WEIGHT statement is used, the residual values are calculated as actual pred icted multiplied by the square root of the WEIGHT variable. OUTS=SAS-data-set
names the SAS data set to contain the estimated covariance matrix of the equation errors. This is the covariance of the residuals computed from the parameter estimates. OUTSN=SAS-data-set
names the SAS data set to contain the estimated normalized covariance matrix of the equation errors. This is valid for multivariate t distribution estimation. OUTSUSED=SAS-data-set
names the SAS data set to contain the S matrix used in the objective function definition. The OUTSUSED= data set is the same as the OUTS= data set for the methods that iterate the S matrix. OUTUNWGTRESID
writes the unweighted residual values computed from the parameter estimates to the OUT=
FIT Statement F 1039
data set. These are residuals computed as actual pred icted with no accounting for the WEIGHT statement, the _WEIGHT_ variable, or any variance expressions. This option is applicable only if the OUT= option is specified. OUTV=SAS-data-set
names the SAS data set to contain the estimate of the variance matrix for GMM and ITGMM. SDATA=SAS-data-set
specifies a data set that provides the covariance matrix of the equation errors. The matrix read from the SDATA= data set is used for the equation covariance matrix (S matrix) in the estimation. (The SDATA= S matrix is used to provide only the initial estimate of S for the methods that iterate the S matrix.) TIME=name
specifies the name of the time variable. This variable must be in the data set. TYPE=name
specifies the estimation type to read from the SDATA= and ESTDATA= data sets. The name specified in the TYPE= option is compared to the _TYPE_ variable in the ESTDATA= and SDATA= data sets to select observations to use in constructing the covariance matrices. When the TYPE= option is omitted, the last estimation type in the data set is used. Valid values are the estimation methods used in PROC MODEL. VDATA=SAS-data-set
specifies a data set that contains a variance matrix for GMM and ITGMM estimation. See the section “Output Data Sets” on page 1160 for details.
Printing Options for FIT Tasks BREUSCH=( variable-list )
specifies the modified Breusch-Pagan test, where variable-list is a list of variables used to model the error variance. CHOW=obs CHOW=(obs1 obs2 . . . obsn)
prints the Chow test for break points or structural changes in a model. The argument is the number of observations in the first sample or a parenthesized list of first sample sizes. If the size of the one of the two groups in which the sample is partitioned is less than the number of parameters, then a predictive Chow test is automatically used. See the section “Chow Tests” on page 1131 for details. COLLIN
prints collinearity diagnostics for the Jacobian crossproducts matrix (XPX) after the parameters have converged. Collinearity diagnostics are also automatically printed if the estimation fails to converge. CORR
prints the correlation matrices of the residuals and parameters. Using CORR is the same as using both CORRB and CORRS.
1040 F Chapter 18: The MODEL Procedure
CORRB
prints the correlation matrix of the parameter estimates. CORRS
prints the correlation matrix of the residuals. COV
prints the covariance matrices of the residuals and parameters. Specifying COV is the same as specifying both COVB and COVS. COVB
prints the covariance matrix of the parameter estimates. COVS
prints the covariance matrix of the residuals. DW < = >
prints Durbin-Watson d statistics, which measure autocorrelation of the residuals. When the residual series is interrupted by missing observations, the Durbin-Watson statistic calculated is d 0 as suggested by Savin and White (1978). This is the usual Durbin-Watson computed by ignoring the gaps. Savin and White show that it has the same null distribution as the DW with no gaps in the series and can be used to test for autocorrelation using the standard tables. The Durbin-Watson statistic is not valid for models that contain lagged endogenous variables. You can use the DW= option to request higher-order Durbin-Watson statistics. Since the ordinary Durbin-Watson statistic tests only for first-order autocorrelation, the Durbin-Watson statistics for higher-order autocorrelation are called generalized Durbin-Watson statistics. DWPROB
prints the significance level (p-values) for the Durbin-Watson tests. Since the Durbin-Watson p-values are computationally expensive, they are not reported by default. In the Durbin-Watson test, the null hypothesis is that there is autocorrelation at a specific lag. See the section “Generalized Durbin-Watson Tests” for limitations of the statistic in the Chapter 8, “The AUTOREG Procedure.” FSRSQ
prints the first-stage R2 statistics for instrumental estimation methods. These R2 statistics measure the proportion of the variance retained when the Jacobian columns associated with the parameters are projected through the instruments space. GODFREY GODFREY=n
performs Godfrey’s tests for autocorrelated residuals for each equation, where n is the maximum autoregressive order, and specifies that Godfrey’s tests be computed for lags 1 through n. The default number of lags is one. HAUSMAN
performs Hausman’s specification test, or m-statistics.
FIT Statement F 1041
NORMAL
performs tests of normality of the model residuals. PCHOW=obs PCHOW=(obs1 obs2 . . . obsn)
prints the predictive Chow test for break points or structural changes in a model. The argument is the number of observations in the first sample or a parenthesized list of first sample sizes. See the section “Chow Tests” on page 1131 for details. PRINTALL
specifies the printing options COLLIN, CORRB, CORRS, COVB, COVS, DETAILS, DW, and FSRSQ. WHITE
specifies White’s test.
Options to Control Iteration Output Details of the output produced are discussed in the section “Iteration History” on page 1092. I
prints the inverse of the crossproducts Jacobian matrix at each iteration. ITALL
specifies all iteration printing-control options (I, ITDETAILS, ITPRINT, and XPX). ITALL also prints the crossproducts matrix (labeled CROSS), the parameter change vector, and the estimate of the cross-equation covariance of residuals matrix at each iteration. ITDETAILS
prints a detailed iteration listing. This includes the ITPRINT information and additional statistics. ITPRINT
prints the parameter estimates, objective function value, and convergence criteria at each iteration. XPX
prints the crossproducts Jacobian matrix at each iteration.
Options to Control the Minimization Process The following options can be helpful when you experience a convergence problem: CONVERGE=value1 CONVERGE=(value1, value2)
specifies the convergence criteria. The convergence measure must be less than value1 before convergence is assumed. value2 is the convergence criterion for the S and V matrices for S
1042 F Chapter 18: The MODEL Procedure
and V iterated methods. value2 defaults to value1. See the section “Convergence Criteria” on page 1078 for details. The default value is CONVERGE=0.001. HESSIAN=CROSS | GLS | FDA
specifies the Hessian approximation used for FIML. HESSIAN=CROSS selects the crossproducts approximation to the Hessian, HESSIAN=GLS selects the generalized least squares approximation to the Hessian, and HESSIAN=FDA selects the finite difference approximation to the Hessian. HESSIAN=GLS is the default. LTEBOUND=n
specifies the local truncation error bound for the integration. This option is ignored if no ordinary differential equations (ODEs) are specified. EPSILON =value
specifies the tolerance value used to transform strict inequalities into inequalities when restrictions on parameters are imposed. By default, EPSILON=1E–8. See the section “Restrictions and Bounds on Parameters” on page 1126 for details. MAXITER=n
specifies the maximum number of iterations allowed. The default is MAXITER=100. MAXSUBITER=n
specifies the maximum number of subiterations allowed for an iteration. For the GAUSS method, the MAXSUBITER= option limits the number of step halvings. For the MARQUARDT method, the MAXSUBITER= option limits the number of times can be increased. The default is MAXSUBITER=30. See the section “Minimization Methods” on page 1077 for details. METHOD=GAUSS | MARQUARDT
specifies the iterative minimization method to use. METHOD=GAUSS specifies the GaussNewton method, and METHOD=MARQUARDT specifies the Marquardt-Levenberg method. The default is METHOD=GAUSS. If the default GAUSS method fails to converge, the procedure switches to the MARQUARDT method. See the section “Minimization Methods” on page 1077 for details. MINTIMESTEP=n
specifies the smallest allowed time step to be used in the integration. This option is ignored if no ODEs are specified. NESTIT
changes the way the iterations are performed for estimation methods that iterate the estimate of the equation covariance (S matrix). The NESTIT option is relevant only for the methods that iterate the estimate of the covariance matrix (ITGMM, ITOLS, ITSUR, IT2SLS, and IT3SLS). See the section “Details on the Covariance of Equation Errors” on page 1076 for an explanation of NESTIT. SINGULAR=value
specifies the smallest pivot value allowed. The default 1.0E–12.
ID Statement F 1043
STARTITER=n
specifies the number of minimization iterations to perform at each grid point. The default is STARTITER=0, which implies that no minimization is performed at the grid points. See the section “Using the STARTITER Option” on page 1085 for more details.
Other Options Other options that can be used on the FIT statement include the following that list and analyze the model: BLOCK, GRAPH, LIST, LISTCODE, LISTDEP, LISTDER, and XREF. The following printing control options are also available: DETAILS, FLOW, INTGPRINT, MAXERRORS=, NOPRINT, PRINTALL, and TRACE. For complete descriptions of these options, see the discussion of the PROC MODEL statement options earlier in this chapter.
ID Statement ID variables ;
The ID statement specifies variables to identify observations in error messages or other listings and in the OUT= data set. The ID variables are normally SAS date or datetime variables. If more than one ID variable is used, the first variable is used to identify the observations; the remaining variables are added to the OUT= data set.
INCLUDE Statement INCLUDE model-names . . . ;
The INCLUDE statement reads model files and inserts their contents into the current model. However, instead of replacing the current model as the RESET MODEL= option does, the contents of included model files are inserted into the model program at the position that the INCLUDE statement appears.
INSTRUMENTS Statement INSTRUMENTS variables < _EXOG_ > ; INSTRUMENTS < variables-list > < _EXOG_ > < EXCLUDE =( parameters ) > < / options > ; INSTRUMENTS (equation, variables) (equation, variables) . . . ;
The INSTRUMENTS statement specifies the instrumental variables to be used in the N2SLS, N3SLS, IT2SLS, IT3SLS, GMM, and ITGMM estimation methods.
1044 F Chapter 18: The MODEL Procedure
There are three ways of specifying the INSTRUMENTS statement. The first form of the INSTRUMENTS statement is declared before a FIT statement and defines the default instruments list. The items specified as instruments can be variables or the special keyword _EXOG_. The keyword _EXOG_ indicates that all the model variables declared EXOGENOUS are to be added to the instruments list. If a single INSTRUMENTS statement of the first form is declared before multiple FIT statements, then it serves as the default instruments list for each of the FIT statements. However, if any of these FIT statements are followed by separate INSTRUMENTS statement, then the latter take precedence over the default list. Hence, in the case of multiple FIT statements, the INSTRUMENTS statement for a particular FIT statement is written below the FIT statement if instruments other than the default are required. For a single FIT statement, you can declare the INSTRUMENTS statement of the first form either preceding or following the FIT statement. The second form of the INSTRUMENTS statement is used only after the FIT statement and before the next RUN statement. The items specified as instruments for the second form can be variables, names of parameters to be estimated, or the special keyword _EXOG_. If you specify the name of a parameter in the instruments list, the partial derivatives of the equations with respect to the parameter (that is, the columns of the Jacobian matrix associated with the parameter) are used as instruments. The parameter itself is not used as an instrument. These partial derivatives should not depend on any of the parameters to be estimated. Only the names of parameters to be estimated can be specified. Note that an INSTRUMENTS statement of only the first form declared before multiple FIT statements serves as the default instruments list. Hence, in the cases of multiple as well as single FIT statements, you can declare the second form of INSTRUMENTS statements only following the FIT statements. In the case where a FIT statement is preceded by an INSTRUMENTS statement of the second form in error and not followed by any INSTRUMENTS statement, then the default list is used. This default list is given by the INSTRUMENTS statement of the first form as explained above. If such a list is not declared, all the model variables declared EXOGENOUS comprise the default. A third form of the INSTRUMENTS statement is used to specify instruments for each equation. No explicit intercept is added, parameters cannot be specified to represent instruments, and the _EXOG_ keyword is not allowed. Equations not explicitly assigned instruments use all the instruments specified for the other equations as well as instruments not assigned specific equations. In the following statements, z1, z2, and z3 are instruments used with equation y1, and z2, z3, and z4 are instruments used with equation y2. proc model data=data_sim; exogenous x1 x2; parms a b c d e f; y1 =a*x1**2 + b*x2**2 + c*x1*x2 ; y2 =d*x1**2 + e*x2**2 + f*x1*x2**2; fit y1 y2 / 3sls ; instruments (y1, z1 z2 z3) (y2,z2 z3 z4); run;
EXCLUDE=(parameters)
specifies that the derivatives of the equations with respect to all of the parameters to be estimated (except the parameters listed in the EXCLUDE list) be used as instruments, in
LABEL Statement F 1045
addition to the other instruments specified. If you use the EXCLUDE= option, you should be sure that the derivatives with respect to the nonexcluded parameters in the estimation are independent of the endogenous variables and not functions of the parameters estimated. The following options can be specified on the INSTRUMENTS statement following a slash (/): NOINTERCEPT NOINT
excludes the constant of 1.0 (intercept) from the instruments list. An intercept is included as an instrument while using the first or second form of the INSTRUMENTS statement unless NOINTERCEPT is specified. When a FIT statement specifies an instrumental variables estimation method and no INSTRUMENTS statement accompanies the FIT statement, the default instruments are used. If no default instruments list has been specified, all the model variables declared EXOGENOUS are used as instruments. See the section “Choice of Instruments” on page 1134 for more details. INTONLY
specifies that only the intercept be used as an instrument. This option is used for GMM estimation where the moments have been specified explicitly.
LABEL Statement LABEL variable=’label’ . . . ;
The LABEL statement specifies a label of up to 255 characters for parameters and other variables used in the model program. Labels are used to identify parts of the printout of FIT and SOLVE tasks. The labels are displayed in the output if the LINESIZE= option is large enough.
MOMENT Statement MOMENT variables = moment specification ;
In many scenarios, endogenous variables are observed from data. From the models, you can simulate these endogenous variables based on a fixed set of parameters. The goal of simulated method of moments (SMM) is to find a set of parameters such that the moments of the simulated data match the moments of the observed variables. If there are many moments to match, the code might be tedious. The following MOMENT statement provides a way to generate some commonly used moments automatically. Multiple MOMENT statements can be used. variables can be one or more endogenous variables. moment specification can have the following four types: ( number list ) specifies that the endogenous variable is raised to the power specified by each number in number list. For example,
1046 F Chapter 18: The MODEL Procedure
moment y = (2 3);
adds the following two equations to be estimated: eq._moment_1 = y**2 - pred.y**2; eq._moment_2 = y**3 - pred.y**3;
ABS( number list ) specifies that the absolute value of the endogenous variable is raised to the power specified by each number in number list. For example, moment y = ABS(3);
adds the following equation to be estimated: eq._moment_2 = abs(y)**3 - abs(pred.y)**3;
LAGnum ( number list ) specifies that the endogenous variable is multiplied by the num th lag of the endogenous variable, and this product is raised to the power specified by each number in number list. For example, moment y = LAG4(3);
adds the following equation to be estimated: eq._moment_3 = (y*lag4(y))**3 - (pred.y*lag4(pred.y))**3;
ABS_LAGnum ( number list ) specifies that the endogenous variable is multiplied by the num th lag of the endogenous variable, and the absolute value of this product is raised to the power specified by each number in number list. For example, moment y = ABS_LAG4(3);
adds the following equation to be estimated: eq._moment_4 = abs(y*lag4(y))**3 - abs(pred.y*lag4(pred.y))**3;
The following PROC MODEL statements use the MOMENT statement to generate 24 moments and fit these moments using SMM. proc model data=_tmpdata list; parms a b .5 s 1; instrument _exog_ / intonly; u = rannor( 10091 ); z = rannor( 97631 ); lsigmasq = xlag(sigmasq,exp(a)); lnsigmasq = a + b * log(lsigmasq) + s * u; sigmasq = exp( lnsigmasq );
OUTVARS Statement F 1047
y = sqrt(sigmasq) * z; moment y = (2 4) abs(1 3) abs_lag1(1 2) abs_lag2(1 2); moment y = abs_lag3(1 2) abs_lag4(1 2) abs_lag5(1 2) abs_lag6(1 2) abs_lag7(1 2) abs_lag8(1 2) abs_lag9(1 2) abs_lag10(1 2); fit y / gmm npreobs=20 ndraw=10; bound s > 0, 1>b>0; run;
OUTVARS Statement OUTVARS variables ;
The OUTVARS statement specifies additional variables defined in the model program to be output to the OUT= data sets. The OUTVARS statement is not needed unless the variables to be added to the output data set are not referred to by the model, or unless you want to include parameters or other special variables in the OUT= data set. The OUTVARS statement includes additional variables, whereas the KEEP statement excludes variables.
PARAMETERS Statement PARAMETERS variable < value > < variable < value > > . . . ;
The PARAMETERS statement declares the parameters of a model and optionally sets their initial values. Valid abbreviations are PARMS and PARM. Each parameter has a single value associated with it, which is the same for all observations. Lagging is not relevant for parameters. If a value is not specified in the PARMS statement (or by the PARMS= option of a FIT statement), the value defaults to 0.0001 for FIT tasks and to a missing value for SOLVE tasks.
Programming Statements To define the model, you can use most of the programming statements that are allowed in the SAS DATA step. See the SAS Language Reference: Dictionary for more information.
1048 F Chapter 18: The MODEL Procedure
RANGE Statement RANGE variable < = first > < TO last > ;
The RANGE statement specifies the range of observations to be read from the DATA= data set. For FIT tasks, the RANGE statement controls the period of fit for the estimation. For SOLVE tasks, the RANGE statement controls the simulation period or forecast horizon. The RANGE variable must be a numeric variable in the DATA= data set that identifies the observations, and the data set must be sorted by the RANGE variable. The first observation in the range is identified by first, and the last observation is identified by last. PROC MODEL uses the first l observations prior to first to initialize the lags, where l is the maximum number of lags needed to evaluate any of the equations to be fit or solved, or the maximum number of lags needed to compute any of the instruments when an instrumental variables estimation method is used. There should be at least l observations in the data set before first. If last is not specified, all the nonmissing observations starting with first are used. If first is omitted, the first l observations are used to initialize the lags, and the rest of the data, until last, is used. If a RANGE statement is used but both first and last are omitted, the RANGE statement variable is used to report the range of observations processed. The RANGE variable should be nonmissing for all observations. Observations that contain missing RANGE values are deleted. The following are examples of RANGE statements: range range range range range
year = 1971 to 1988; /* yearly data date = '1feb73'd to '1nov82'd; /* monthly data time = 60.5; /* time in years year to 1977; /* use all years through 1977 date; /* use values of date to report period-of-fit
*/ */ */ */ */
If no RANGE statements follow multiple FIT statements and a single RANGE statement is declared before all the FIT statements, estimation in each of the multiple FIT statements is based on the data specified in the single RANGE statement. A single RANGE statement following multiple FIT statements affects only the fit immediately preceding it. If the FIT statement is both followed by and preceded by RANGE statements, the following RANGE statement takes precedence over the preceding RANGE statement. In the case where a range of data is to be used for a particular SOLVE task, the RANGE statement should be specified following the SOLVE statement in the case of either single or multiple SOLVE statements.
RESET Statement F 1049
RESET Statement RESET options ;
All of the options of the PROC MODEL statement can be reset by the RESET statement. In addition, the RESET statement supports one additional option: PURGE
deletes the current model so that a new model can be defined. When the MODEL= option is used in the RESET statement, the current model is deleted before the new model is read.
RESTRICT Statement RESTRICT restriction1 < , restriction2 . . . > ;
The RESTRICT statement is used to impose linear and nonlinear restrictions on the parameter estimates. RESTRICT statements refer to the parameters estimated by the associated FIT statement (that is, to either the preceding FIT statement or, in the absence of a preceding FIT statement, to the following FIT statement). You can specify any number of RESTRICT statements. Each restriction is written as an optional name, followed by an expression, followed by an equality operator (=) or an inequality operator (, =), followed by a second expression: < "name" > expression operator expression
The optional "name" is a string used to identify the restriction in the printed output and in the OUTEST= data set. The operator can be =, , =. The operator and second expression are optional. Restriction expressions can be composed of parameter names, arithmetic operators, functions, and constants. Comparison operators (such as = or ;
The SOLVE statement specifies that the model be simulated or forecast for input data values and, optionally, selects the variables to be solved. If the list of variables is omitted, all of the model variables declared ENDOGENOUS are solved. If no model variables are declared ENDOGENOUS, then all model variables are solved. The following specification can be used in the SOLVE statement: SATISFY=equation SATISFY=( equations )
specifies a subset of the model equations that the solution values are to satisfy. If the SATISFY= option is not used, the solution is computed to satisfy all the model equations. Note that the number of equations must equal the number of variables solved.
Data Set Options DATA=SAS-data-set
names the input data set. The model is solved for each observation read from the DATA= data set. If the DATA= option is not specified on the SOLVE statement, the data set specified by the DATA= option in the PROC MODEL statement is used. ESTDATA=SAS-data-set
names a data set whose first observation provides values for some or all of the parameters and whose additional observations (if any) give the covariance matrix of the parameter estimates. The covariance matrix read from the ESTDATA= data set is used to generate multivariate normal pseudo-random shocks to the model parameters when the RANDOM= option requests Monte Carlo simulation. OUT=SAS-data-set
outputs the predicted (solution) values, residual values, actual values, or equation errors from the solution to a data set. The residual values are the actual pred icted values, which is the negative of RESID.variable as defined in the section “Equation Translations” on page 1204. Only the solution values are output by default. OUTACTUAL
outputs the actual values of the solved variables read from the input data set to the OUT= data set. This option is applicable only if the OUT= option is specified.
SOLVE Statement F 1051
OUTALL
specifies the OUTACTUAL, OUTERRORS, OUTLAGS, OUTPREDICT, and OUTRESID options. OUTERRORS
writes the equation errors to the OUT= data set. These values are normally very close to zero when a simultaneous solution is computed; they can be used to double-check the accuracy of the solution process. It is applicable only if the OUT= option is specified. OUTLAGS
writes the observations used to start the lags to the OUT= data set. This option is applicable only if the OUT= option is specified. OUTPREDICT
writes the solution values to the OUT= data set. This option is relevant only if the OUT= option is specified. The OUTPREDICT option is the default unless one of the other output options is used. OUTRESID
writes the residual values computed as the actual pred icted values and is not the same as the RESID.variable values. This option is applicable only if the OUT= option is specified. PARMSDATA=SAS-data-set
specifies a data set that contains the parameter estimates. See the section “Input Data Sets” on page 1154 for more details. RESIDDATA=SAS-data-set
specifies a data set that contains the residuals that are to be used in the empirical distribution. This data set can be created using the OUT= option on the FIT statement. SDATA=SAS-data-set
specifies a data set that provides the covariance matrix of the equation errors. The covariance matrix read from the SDATA= data set is used to generate multivariate normal pseudo-random shocks to the equations when the RANDOM= option requests Monte Carlo simulation. TIME=name
specifies the name of the time variable. This variable must be in the data set. TYPE=name
specifies the estimation type. The name specified in the TYPE= option is compared to the _TYPE_ variable in the ESTDATA= and SDATA= data sets to select observations to use in constructing the covariance matrices. When TYPE= is omitted, the last estimation type in the data set is used.
Solution Mode Options: Lag Processing DYNAMIC
specifies a dynamic solution. In the dynamic solution mode, solved values are used by the lagging functions. DYNAMIC is the default.
1052 F Chapter 18: The MODEL Procedure
NAHEAD=n
specifies a simulation of n-period-ahead dynamic forecasting. The NAHEAD= option is used to simulate the process of using the model to produce successive forecasts to a fixed forecast horizon, in which each forecast uses the historical data available at the time the forecast is made. Note that NAHEAD=1 produces a static (one-step-ahead) solution. NAHEAD=2 produces a solution that uses one-step-ahead solutions for the first lag (LAG1 functions return static predicted values) and actual values for longer lags. NAHEAD=3 produces a solution that uses NAHEAD=2 solutions for the first lags, NAHEAD=1 solutions for the second lags, and actual values for longer lags. In general, NAHEAD=n solutions use NAHEAD=n–1 solutions for LAG1, NAHEAD=n–2 solutions for LAG2, and so forth. START=s
specifies static solutions until the sth observation and then changes to dynamic solutions. If the START=s option is specified, the first observation in the range in which LAGn delivers solved predicted values is s+n, while LAGn returns actual values for earlier observations. STATIC
specifies a static solution. In static solution mode, actual values of the solved variables from the input data set are used by the lagging functions.
Solution Mode Options: Use of Available Data FORECAST
specifies that the actual value of a solved variable is used as the solution value (instead of the predicted value from the model equations) whenever nonmissing data are available in the input data set. That is, in FORECAST mode, PROC MODEL solves only for those variables that are missing in the input data set. SIMULATE
specifies that PROC MODEL always solves for all solution variables as a function of the input values of the other variables, even when actual data for some of the solution variables are available in the input data set. SIMULATE is the default.
Solution Mode Options: Numerical Solution Method JACOBI
computes a simultaneous solution using a Jacobi iteration. NEWTON
computes a simultaneous solution by using Newton’s method. When the NEWTON option is selected, the analytic derivatives of the equation errors with respect to the solution variables are computed, and memory-efficient sparse matrix techniques are used for factoring the Jacobian matrix. The NEWTON option can be used to solve both normalized-form and general-form equations and can compute goal-seeking solutions. NEWTON is the default.
SOLVE Statement F 1053
SEIDEL
computes a simultaneous solution by using a Gauss-Seidel method. SINGLE ONEPASS
specifies a single-equation (nonsimultaneous) solution. The model is executed once to compute predicted values for the variables from the actual values of the other endogenous variables. The SINGLE option can be used only for normalized-form equations and cannot be used for goal-seeking solutions. For more information on these options, see the section “Solution Modes” on page 1166.
Monte Carlo Simulation Options COPULA=(NORMAL | NORMALMIX( n, p1 . . . pn , v1 . . . vn ) | T(df ) < ASYM > )
specifies copula to be used in the simulation. The normal (Gaussian) copula is the default. The copula applies to covariance of equation errors. PSEUDO=DEFAULT | TWISTER
specifies which pseudo-number generator is to be used in generating draws for Monte Carlo simulation. The two pseudo-random number generators supported by the MODEL procedure are a default congruential generator which has period 231 1 and Mersenne-Twister pseudorandom number generator which has an extraordinarily long period 219937 1. QUASI=NONE|SOBOL|FAURE
specifies a pseudo- or quasi-random number generator. Two quasi-random number generators are supported by the MODEL procedure: the Sobol sequence (QUASI=SOBOL) and the Faure sequence (QUASI=FAURE). The default is QUASI=NONE, which is the pseudo-random number generator. RANDOM=n
repeats the solution n times for each BY group, with different random perturbations of the equation errors if the SDATA= option is used; with different random perturbations of the parameters if the ESTDATA= option is used and the ESTDATA= data set contains a parameter covariance matrix; and with different values returned from the random number generator functions, if any are used in the model program. If RANDOM=0, the random number generator functions always return zero. See the section “Monte Carlo Simulation” on page 1170 for details. The default is RANDOM=0. SEED=n
specifies an integer to use as the seed in generating pseudo-random numbers to shock the parameters and equations when the ESTDATA= or the SDATA= options are specified. If n is negative or zero, the time of day from the computer’s clock is used as the seed. The SEED= option is relevant only if the RANDOM= option is used. The default is SEED=0. WISHART=df
specifies that a Wishart distribution with degrees of freedom df be used in place of the normal error covariance matrix. This option is used to model the variance of the error covariance matrix when Monte Carlo simulation is selected.
1054 F Chapter 18: The MODEL Procedure
Options for Controlling the Numerical Solution Process The following options are useful when you have difficulty converging to the simultaneous solution. CONVERGE=value
specifies the convergence criterion for the simultaneous solution. Convergence of the solution is judged by comparing the CONVERGE= value to the maximum over the equations of ji j jyi j C 1E
6
if they are computable, otherwise ji j where i represents the equation error and y i represents the solution variable that corresponds to the ith equation for normalized-form equations. The default is CONVERGE=1E–8. MAXITER=n
specifies the maximum number of iterations allowed for computing the simultaneous solution for any observation. The default is MAXITER=50. MAXSUBITER=n
specifies the maximum number of damping subiterations that are performed in solving a nonlinear system when using the NEWTON solution method. Damping is disabled by setting MAXSUBITER=0. The default is MAXSUBITER=10.
Printing Options INTGPRINT
prints between data points integration values for the DERT. variables and the auxiliary variables. If you specify the DETAILS option, the integrated derivative variables are printed as well. ITPRINT
prints the solution approximation and equation errors at each iteration for each observation. This option can produce voluminous output. PRINTALL
specifies the printing control options DETAILS, ITPRINT, SOLVEPRINT, STATS, and THEIL. SOLVEPRINT
prints the solution values and residuals at each observation. STATS
prints various summary statistics for the solution values. THEIL
prints tables of Theil inequality coefficients and Theil relative change forecast error measures for the solution values. See the section “Summary Statistics” on page 1184 for more information.
TEST Statement F 1055
Other Options Other options that can be used on the SOLVE statement include the following that list and analyze the model: BLOCK, GRAPH, LIST, LISTCODE, LISTDEP, LISTDER, and XREF. The LTEBOUND= and MINTIMESTEP= options can be used to control the integration process. The following printingcontrol options are also available: DETAILS, FLOW, MAXERRORS=, NOPRINT, and TRACE. For complete descriptions of these options, see the PROC MODEL and FIT statement options described earlier in this chapter.
TEST Statement TEST < "name" > test1 < , test2 . . . > < ,/ options > ;
The TEST statement performs tests of nonlinear hypotheses on the model parameters. The TEST statement applies to the parameters estimated by the associated FIT statement (that is, either the preceding FIT statement or, in the absence of a preceding FIT statement, the following FIT statement). You can specify any number of TEST statements. If you specify options on the TEST statement, a comma is required before the “/” character that separates the test expressions from the options, because the “/” character can also be used within test expressions to indicate division. The label lengths for tests and estimate statements are 256 characters. If the labels exceed this length, the label is truncated to 256 characters with a note printed to the log. Each test is written as an expression optionally followed by an equal sign (=) and a second expression: < expression > < = expression >
Test expressions can be composed of parameter names, arithmetic operators, functions, and constants. Comparison operators (such as =) and logical operators (such as &) cannot be used in TEST statement expressions. Parameters named in test expressions must be among the parameters estimated by the associated FIT statement. If you specify only one expression in a test, that expression is tested against zero. For example, the following two TEST statements are equivalent: test a + b; test a + b = 0;
When you specify multiple tests in the same TEST statement, a joint test is performed. For example, the following TEST statement tests the joint hypothesis that both A and B are equal to zero. test a, b;
1056 F Chapter 18: The MODEL Procedure
To perform separate tests rather than a joint test, use separate TEST statements. For example, the following TEST statements test the two separate hypotheses that A is equal to zero and that B is equal to zero. test a; test b;
You can use the following options in the TEST statement. WALD
specifies that a Wald test be computed. WALD is the default. LM RAO LAGRANGE
specifies that a Lagrange multiplier test be computed. LR LIKE
specifies that a likelihood ratio test be computed. ALL
requests all three types of tests. OUT=SAS-data-set
specifies the name of an output SAS data set that contains the test results. The format of the OUT= data set produced by the TEST statement is similar to that of the OUTEST= data set produced by the FIT statement.
VAR Statement VAR variables < initial-values > . . . ;
The VAR statement declares model variables and optionally provides initial values for the lags of the variables. See the section “Lag Logic” on page 1210 for more information.
WEIGHT Statement WEIGHT variable ;
The WEIGHT statement specifies a variable to supply weighting values to use for each observation in estimating parameters.
Details: Estimation by the MODEL Procedure F 1057
If the weight of an observation is nonpositive, that observation is not used for the estimation. variable must be a numeric variable in the input data set. An alternative weighting method is to use an assignment statement to give values to the special variable _WEIGHT_. The _WEIGHT_ variable must not depend on the parameters being estimated. If both weighting specifications are given, the weights are multiplied together.
Details: Estimation by the MODEL Procedure
Estimation Methods Consider the general nonlinear model: t
D q.yt ; xt ; /
zt
D Z.xt /
where q 2Rg is a real vector valued function of yt 2Rg , xt 2Rl , 2Rp , where g is the number of equations, l is the number of exogenous variables (lagged endogenous variables are considered exogenous here), p is the number of parameters, and t ranges from 1 to n. zt 2Rk is a vector of instruments. t is an unobservable disturbance vector with the following properties: E.t / D 0 0
E.t t / D † All of the methods implemented in PROC MODEL aim to minimize an objective function. The following table summarizes the objective functions that define the estimators and the corresponding estimator of the covariance of the parameter estimates for each method. Table 18.2
Summary of PROC MODEL Estimation Methods
Method OLS ITOLS SUR ITSUR N2SLS IT2SLS N3SLS IT3SLS GMM ITGMM FIML
Instruments no no no no yes yes yes yes yes yes no
Objective Function r0 r=n r0 .diag.S/ 1 ˝I/r=n 1 r0 .SOLS ˝I/r=n 0 1 r .S ˝I/r=n r0 .I˝W/r=n r0 .diag.S/ 1 ˝W/r=n 1 r0 .SN2SLS ˝W/r=n 0 1 r .S ˝W/r=n O 1 Œnmn ./=n Œnmn . /0 V N2SLS O 1 Œnmn ./=n Œnmn . /0 V n const Pn ant C 2 ln.det.S// 1 lnj.Jt /j
Covariance of .X0 .diag.S/ 1 ˝I/X/ 1 .X0 .diag.S/ 1 ˝I/X/ 1 .X0 .S 1 ˝I/X/ 1 .X0 .S 1 ˝I/X/ 1 .X0 .diag.S/ 1 ˝W/X/ 1 .X0 .diag.S/ 1 ˝W/X/ 1 .X0 .S 1 ˝W/X/ 1 .X0 .S 1 ˝W/X/ 1 O 1 .YX/ 1 Œ.YX/0 V O 1 .YX/ 1 Œ.YX/0 V 0 1 O 1 ŒZO .S ˝I/Z
1058 F Chapter 18: The MODEL Procedure
The column labeled “Instruments” identifies the estimation methods that require instruments. The variables used in this table and the remainder of this chapter are defined as follows: n D is the number of nonmissing observations. g D is the number of equations. k D is the number of instrumental variables. 2 3 r1 6 r2 7 6 7 r D 6 : 7 is the ng 1 vector of residuals for the g equations stacked together. 4 :: 5 rg 2 3 qi .y1 ; x1 ; / 6 qi .y2 ; x2 ; / 7 6 7 ri D 6 7 is the n 1 column vector of residuals for the ith equation. :: 4 5 : qi .yn ; xn ; / S
is a g g matrix that estimates †, the covariances of the errors across equations (referred to as the S matrix).
X
is an ng p matrix of partial derivatives of the residual with respect to the parameters.
W
is an n n matrix, Z.Z0 Z/
Z
is an n k matrix of instruments.
Y
is a gk ng matrix of instruments. Y D Ig ˝Z0 .
ZO
ZO D .ZO 1 ; ZO 2 ; : : :; ZO p / is an ngp matrix. ZO i is a ng1 column vector obtained from stacking the columns of n 1 X @q.yt ; xt ; /0 1 @2 q.yt ; xt ; /0 U Qi n @yt @yt @i
1 Z0 .
t D1
U
is an ng matrix of residual errors. U D 1 ; 2 ; : : :; n 0 .
Q
is the ng matrix q.y1 ; x1 ; /; q.y2 ; x2 ; /; : : :; q.yn ; xn ; /.
Qi
is an ng matrix
I
is an n n identity matrix.
Jt
is
mn
is first moment of the crossproduct q.yt ; xt ; /˝zt , P mn D n1 ntD1 q.yt ; xt ; /˝zt
zt O V
is a k column vector of instruments for observation t. z0t is also the t th row of Z.
k
is the number of instrumental variables used.
constant
is the constant
˝
is the notation for a Kronecker product.
@q.yt ;xt ;/ , 0 @yt
@Q . @i
which is a g g Jacobian matrix.
is the gk gk matrix that represents the variance of the moment functions. ng 2 .1
C ln.2//.
Estimation Methods F 1059
All vectors are column vectors unless otherwise noted. Other estimates of the covariance matrix for FIML are also available.
Dependent Regressors and Two-Stage Least Squares Ordinary regression analysis is based on several assumptions. A key assumption is that the independent variables are in fact statistically independent of the unobserved error component of the model. If this assumption is not true (if the regressor varies systematically with the error), then ordinary regression produces inconsistent results. The parameter estimates are biased. Regressors might fail to be independent variables because they are dependent variables in a larger simultaneous system. For this reason, the problem of dependent regressors is often called simultaneous equation bias. For example, consider the following two-equation system: y1 D a1 C b1 y2 C c1 x1 C 1 y2 D a2 C b2 y1 C c2 x2 C 2 In the first equation, y2 is a dependent, or endogenous, variable. As shown by the second equation, y2 is a function of y1 , which by the first equation is a function of 1 , and therefore y2 depends on 1 . Likewise, y1 depends on 2 and is a dependent regressor in the second equation. This is an example of a simultaneous equation system; y1 and y2 are a function of all the variables in the system. Using the ordinary least squares (OLS) estimation method to estimate these equations produces biased estimates. One solution to this problem is to replace y1 and y2 on the right-hand side of the equations with predicted values, thus changing the regression problem to the following: y1 D a1 C b1 yO2 C c1 x1 C 1 y2 D a2 C b2 yO1 C c2 x2 C 2 This method requires estimating the predicted values yO1 and yO2 through a preliminary, or “first stage,” instrumental regression. An instrumental regression is a regression of the dependent regressors on a set of instrumental variables, which can be any independent variables useful for predicting the dependent regressors. In this example, the equations are linear and the exogenous variables for the whole system are known. Thus, the best choice for instruments (of the variables in the model) are the variables x1 and x2 . This method is known as two-stage least squares or 2SLS, or more generally as the instrumental variables method. The 2SLS method for linear models is discussed in Pindyck (1981, p. 191–192). For nonlinear models this situation is more complex, but the idea is the same. In nonlinear 2SLS, the derivatives of the model with respect to the parameters are replaced with predicted values. See the section “Choice of Instruments” on page 1134 for further discussion of the use of instrumental variables in nonlinear regression. To perform nonlinear 2SLS estimation with PROC MODEL, specify the instrumental variables with an INSTRUMENTS statement and specify the 2SLS or N2SLS option in the FIT statement. The following statements show how to estimate the first equation in the preceding example with PROC MODEL:
1060 F Chapter 18: The MODEL Procedure
proc model data=in; y1 = a1 + b1 * y2 + c1 * x1; fit y1 / 2sls; instruments x1 x2; run;
The 2SLS or instrumental variables estimator can be computed by using a first-stage regression on the instrumental variables as described previously. However, PROC MODEL actually uses the equivalent but computationally more appropriate technique of projecting the regression problem into the linear space defined by the instruments. Thus, PROC MODEL does not produce any “first stage” results when you use 2SLS. If you specify the FSRSQ option in the FIT statement, PROC MODEL prints “First-Stage R2 ” statistic for each parameter estimate. Formally, the O that minimizes !0 n ! n X X 1 .q.yt ; xt ; /˝zt / I ˝zt z0t SOn D n t D1
t D1
1
n X
! .q.yt ; xt ; /˝zt /
tD1
is the N2SLS estimator of the parameters. The estimate of † at the final iteration is used in the covariance of the parameters given in Table 18.2. See Amemiya (1985, p. 250) for details on the properties of nonlinear two-stage least squares.
Seemingly Unrelated Regression If the regression equations are not simultaneous (so there are no dependent regressors), seemingly unrelated regression (SUR) can be used to estimate systems of equations with correlated random errors. The large-sample efficiency of an estimation can be improved if these cross-equation correlations are taken into account. SUR is also known as joint generalized least squares or Zellner regression. Formally, the O that minimizes n
1X O q.yt ; xt ; /0 † SOn D n
1
q.yt ; xt ; /
t D1
is the SUR estimator of the parameters. The SUR method requires an estimate of the cross-equation covariance matrix, †. PROC MODEL O from the OLS residuals, and then first performs an OLS estimation, computes an estimate, †, O performs the SUR estimation based on †. The OLS results are not printed unless you specify the OLS option in addition to the SUR option. O to use for SUR by storing the matrix in a SAS data set and naming that data You can specify the † O computed from the SUR residuals back into the set in the SDATA= option. You can also feed the † SUR estimation process by specifying the ITSUR option. You can print the estimated covariance O by using the COVS option in the FIT statement. matrix † The SUR method requires estimation of the † matrix, and this increases the sampling variability of the estimator for small sample sizes. The efficiency gain that SUR has over OLS is a large sample property, and you must have a reasonable amount of data to realize this gain. For a more detailed discussion of SUR, see Pindyck and Rubinfeld (1981, p. 331-333).
Estimation Methods F 1061
Three-Stage Least Squares Estimation If the equation system is simultaneous, you can combine the 2SLS and SUR methods to take into account both dependent regressors and cross-equation correlation of the errors. This is called three-stage least squares (3SLS). Formally, the O that minimizes !0 n ! n X X 1 0 O SOn D .q.yt ; xt ; /˝zt / .†˝z t zt / n t D1
1
t D1
n X
! .q.yt ; xt ; /˝zt /
t D1
is the 3SLS estimator of the parameters. For more details on 3SLS, see Gallant (1987, p. 435). Residuals from the 2SLS method are used to estimate the † matrix required for 3SLS. The results of the preliminary 2SLS step are not printed unless the 2SLS option is also specified. To use the three-stage least squares method, specify an INSTRUMENTS statement and use the 3SLS or N3SLS option in either the PROC MODEL statement or a FIT statement.
Generalized Method of Moments (GMM) For systems of equations with heteroscedastic errors, generalized method of moments (GMM) can be used to obtain efficient estimates of the parameters. See the section “Heteroscedasticity” on page 1100 for alternatives to GMM. Consider the nonlinear model t
D q.yt ; xt ; /
zt
D Z.xt /
where zt is a vector of instruments and t is an unobservable disturbance vector that can be serially correlated and nonstationary. In general, the following orthogonality condition is desired: E.t ˝zt / D 0 This condition states that the expected crossproducts of the unobservable disturbances, t , and functions of the observable variables are set to 0. The first moment of the crossproducts is n
mn D
1X m.yt ; xt ; / n t D1
m.yt ; xt ; / D q.yt ; xt ; /˝zt where m.yt ; xt ; /2Rgk . The case where gk > p is considered here, where p is the number of parameters.
1062 F Chapter 18: The MODEL Procedure
Estimate the true parameter vector 0 by the value of O that minimizes 1
S.; V/ D Œnmn . /0 V
Œnmn ./=n
where V D Cov Œnmn . 0 /; Œnmn . 0 /0
The parameter vector that minimizes this objective function is the GMM estimator. GMM estimation is requested in the FIT statement with the GMM option. The variance of the moment functions, V, can be expressed as n X
V D E D
t D1 n n XX
! t ˝zt
n X
!0 s ˝zs
sD1
E .t ˝zt /.s ˝zs /0
t D1 sD1
D nS0n where S0n is estimated as n
n
1 XX SO n D .q.yt ; xt ; /˝zt /.q.ys ; xs ; /˝zs /0 n t D1 sD1
Note that SO n is a gkgk matrix. Because Var .SO n / does not decrease with increasing n, you consider estimators of S0n of the form: SO n .l.n// D
n X1 D nC1
SO n;
w.
/DSO n; D l.n/
8 n ˆ < P Œq.y ; xt ; # /˝zt Œq.y t t D t D1C ˆ 0 :.SO n; /
; xt ;
# /˝z
0 t
0 ChiSq 21.16 15.83
4 2
0.0003 0.0004
Variables Cross of all vars 1, income, incsq
Correcting for Heteroscedasticity There are two methods for improving the efficiency of the parameter estimation in the presence of heteroscedastic errors. If the error variance relationships are known, weighted regression can be used or an error model can be estimated. For details about error model estimation, see the section “Error Covariance Structure Specification” on page 1112. If the error variance relationship is unknown, GMM estimation can be used. Weighted Regression
The WEIGHT statement can be used to correct for the heteroscedasticity. Consider the following model, which has a heteroscedastic error term: p yt D 250.e 0:2t e 0:8t / C .9=t/t The data for this model is generated with the following SAS statements. data test; do t=1 to 25; y = 250 * (exp( -0.2 * t ) - exp( -0.8 * t )) + sqrt( 9 / t ) * rannor(1); output; end; run;
If this model is estimated with OLS, as shown in the following statements, the estimates shown in Figure 18.40 are obtained for the parameters. proc model data=test; parms b1 0.1 b2 0.9; y = 250 * ( exp( -b1 * t ) - exp( -b2 * t ) ); fit y; run;
Heteroscedasticity F 1103
Figure 18.40 Unweighted OLS Estimates The MODEL Procedure Nonlinear OLS Parameter Estimates
Parameter
Estimate
Approx Std Err
t Value
Approx Pr > |t|
b1 b2
0.200977 0.826236
0.00101 0.00853
198.60 96.82
ChiSq 2SLS
6
13.86
0.0313
Figure 18.54 indicates that 2SLS is preferred over OLS at 5% level of significance. In this case, the null hypothesis of no measurement error is rejected. Hence, the instrumental variable estimator is required for this example due to the presence of measurement error.
Chow Tests F 1131
Chow Tests The Chow test is used to test for break points or structural changes in a model. The problem is posed as a partitioning of the data into two parts of size n1 and n2 . The null hypothesis to be tested is Ho W ˇ1 D ˇ2 D ˇ where ˇ1 is estimated by using the first part of the data and ˇ2 is estimated by using the second part. The test is performed as follows (see Davidson and MacKinnon 1993, p. 380). 1. The p parameters of the model are estimated. O from the nonlinear estimation in 2. A second linear regression is performed on the residuals, u, step one. O C residuals uO D Xb O is Jacobian columns that are evaluated at the parameter estimates. If the estimation where X is an instrumental variables estimation with matrix of instruments W, then the following regression is performed: O C residuals uO D PW Xb where PW is the projection matrix. 3. The restricted SSE (RSSE) from this regression is obtained. An SSE for each subsample is then obtained by using the same linear regression. 4. The F statistic is then f D
.RSSE SSE1 SSE2 /=p .SSE1 C SSE2 /=.n 2p/
This test has p and n
2p degrees of freedom.
Chow’s test is not applicable if min.n1 ; n2 / < p, since one of the two subsamples does not contain enough data to estimate ˇ. In this instance, the predictive Chow test can be used. The predictive Chow test is defined as f D
.RSSE
SSE1 /.n1 SSE1 n2
p/
where n1 > p. This test can be derived from the Chow test by noting that the SSE2 D 0 when n2 F
40 50 60 90
2 2 2 11
96 96 96 87
12.95 101.37 26.43 1.86
< ,M= method > < ,TYPE= V > ) ;
where name
specifies a prefix for %AR to use in constructing names of variables needed to define the AR process. If the endolist is not specified, the endogenous list defaults to name, which must be the name of the equation to which the AR error process is to be applied. The name value cannot exceed 32 characters.
nlag
is the order of the AR process.
endolist
specifies the list of equations to which the AR process is to be applied. If more than one name is given, an unrestricted vector process is created with the structural residuals of all the equations included as regressors in each of the equations. If not specified, endolist defaults to name.
laglist
specifies the list of lags at which the AR terms are to be added. The coefficients of the terms at lags not listed are set to 0. All of the listed lags must be less than or equal to nlag, and there must be no duplicates. If not specified, the laglist defaults to all lags 1 through nlag.
M=method
specifies the estimation method to implement. Valid values of M= are CLS (conditional least squares estimates), ULS (unconditional least squares estimates), and ML (maximum likelihood estimates). M=CLS is the default. Only M=CLS
1148 F Chapter 18: The MODEL Procedure
is allowed when more than one equation is specified. The ULS and ML methods are not supported for vector AR models by %AR. TYPE=V
specifies that the AR process is to be applied to the endogenous variables themselves instead of to the structural residuals of the equations.
Restricted Vector Autoregression You can control which parameters are included in the process, restricting to 0 those parameters that you do not include. First, use %AR with the DEFER option to declare the variable list and define the dimension of the process. Then, use additional %AR calls to generate terms for selected equations with selected variables at selected lags. For example, proc model data=d; y1 = ... equation for y1 ...; y2 = ... equation for y2 ...; y3 = ... equation for y3 ...; %ar( name, 2, y1 y2 y3, defer ) %ar( name, y1, y1 y2 ) %ar( name, y2 y3, , 1 ) fit y1 y2 y3; run;
The error equations produced are as follows: y1 = pred.y1 + name1_1_1*zlag1(y1-name_y1) + name1_1_2*zlag1(y2-name_y2) + name2_1_1*zlag2(y1-name_y1) + name2_1_2*zlag2(y2-name_y2) ; y2 = pred.y2 + name1_2_1*zlag1(y1-name_y1) + name1_2_2*zlag1(y2-name_y2) + name1_2_3*zlag1(y3-name_y3) ; y3 = pred.y3 + name1_3_1*zlag1(y1-name_y1) + name1_3_2*zlag1(y2-name_y2) + name1_3_3*zlag1(y3-name_y3) ;
This model states that the errors for Y1 depend on the errors of both Y1 and Y2 (but not Y3) at both lags 1 and 2, and that the errors for Y2 and Y3 depend on the previous errors for all three variables, but only at lag 1.
%AR Macro Syntax for Restricted Vector AR An alternative use of %AR is allowed to impose restrictions on a vector AR process by calling %AR several times to specify different AR terms and lags for different equations. The first call has the general form %AR( name, nlag, endolist , DEFER ) ;
where
Autoregressive Moving-Average Error Processes F 1149
name
specifies a prefix for %AR to use in constructing names of variables needed to define the vector AR process.
nlag
specifies the order of the AR process.
endolist
specifies the list of equations to which the AR process is to be applied.
DEFER
specifies that %AR is not to generate the AR process but is to wait for further information specified in later %AR calls for the same name value.
The subsequent calls have the general form %AR( name, eqlist, varlist, laglist,TYPE= )
where name
is the same as in the first call.
eqlist
specifies the list of equations to which the specifications in this %AR call are to be applied. Only names specified in the endolist value of the first call for the name value can appear in the list of equations in eqlist.
varlist
specifies the list of equations whose lagged structural residuals are to be included as regressors in the equations in eqlist. Only names in the endolist of the first call for the name value can appear in varlist. If not specified, varlist defaults to endolist.
laglist
specifies the list of lags at which the AR terms are to be added. The coefficients of the terms at lags not listed are set to 0. All of the listed lags must be less than or equal to the value of nlag, and there must be no duplicates. If not specified, laglist defaults to all lags 1 through nlag.
The %MA Macro The SAS macro %MA generates programming statements for PROC MODEL for moving-average models. The %MA macro is part of SAS/ETS software, and no special options are needed to use the macro. The moving-average error process can be applied to the structural equation errors. The syntax of the %MA macro is the same as the %AR macro except there is no TYPE= argument. When you are using the %MA and %AR macros combined, the %MA macro must follow the %AR macro. The following SAS/IML statements produce an ARMA(1, (1 3)) error process and save it in the data set MADAT2. /* use IML module to simulate a MA process */ proc iml; phi = { 1 .2 }; theta = { 1 .3 0 .5 }; y = armasim( phi, theta, 0, .1, 200, 32565 ); create madat2 from y[colname='y']; append from y; quit;
1150 F Chapter 18: The MODEL Procedure
The following PROC MODEL statements are used to estimate the parameters of this model by using maximum likelihood error structure: title 'Maximum Likelihood ARMA(1, (1 3))'; proc model data=madat2; y=0; %ar( y, 1, , M=ml ) %ma( y, 3, , 1 3, M=ml ) /* %MA always after %AR */ fit y; run; title;
The estimates of the parameters produced by this run are shown in Figure 18.61. Figure 18.61 Estimates from an ARMA(1, (1 3)) Process Maximum Likelihood ARMA(1, (1 3)) The MODEL Procedure Nonlinear OLS Summary of Residual Errors
Equation
DF Model
DF Error
SSE
MSE
Root MSE
R-Square
Adj R-Sq
3
197 197
2.6383 1.9957
0.0134 0.0101
0.1157 0.1007
-0.0067
-0.0169
y RESID.y
Nonlinear OLS Parameter Estimates
Parameter
Estimate
Approx Std Err
t Value
Approx Pr > |t|
y_l1
-0.10067
0.1187
-0.85
0.3973
y_m1
-0.1934
0.0939
-2.06
0.0408
y_m3
-0.59384
0.0601
-9.88
> < ,M= method > ) ;
where name
specifies a prefix for %MA to use in constructing names of variables needed to define the MA process and is the default endolist.
nlag
is the order of the MA process.
Autoregressive Moving-Average Error Processes F 1151
endolist
specifies the equations to which the MA process is to be applied. If more than one name is given, CLS estimation is used for the vector process.
laglist
specifies the lags at which the MA terms are to be added. All of the listed lags must be less than or equal to nlag, and there must be no duplicates. If not specified, the laglist defaults to all lags 1 through nlag.
M=method
specifies the estimation method to implement. Valid values of M= are CLS (conditional least squares estimates), ULS (unconditional least squares estimates), and ML (maximum likelihood estimates). M=CLS is the default. Only M=CLS is allowed when more than one equation is specified in the endolist.
%MA Macro Syntax for Restricted Vector Moving-Average An alternative use of %MA is allowed to impose restrictions on a vector MA process by calling %MA several times to specify different MA terms and lags for different equations. The first call has the general form %MA( name , nlag , endolist , DEFER ) ;
where name
specifies a prefix for %MA to use in constructing names of variables needed to define the vector MA process.
nlag
specifies the order of the MA process.
endolist
specifies the list of equations to which the MA process is to be applied.
DEFER
specifies that %MA is not to generate the MA process but is to wait for further information specified in later %MA calls for the same name value.
The subsequent calls have the general form %MA( name, eqlist, varlist, laglist )
where name
is the same as in the first call.
eqlist
specifies the list of equations to which the specifications in this %MA call are to be applied.
varlist
specifies the list of equations whose lagged structural residuals are to be included as regressors in the equations in eqlist.
laglist
specifies the list of lags at which the MA terms are to be added.
1152 F Chapter 18: The MODEL Procedure
Distributed Lag Models and the %PDL Macro In the following example, the variable y is modeled as a linear function of x, the first lag of x, the second lag of x, and so forth: yt D a C b0 xt C b1 xt
1
C b2 xt
2
C b3 xt
3
C : : : C bn xt
n
Models of this sort can introduce a great many parameters for the lags, and there may not be enough data to compute accurate independent estimates for them all. Often, the number of parameters is reduced by assuming that the lag coefficients follow some pattern. One common assumption is that the lag coefficients follow a polynomial in the lag length bi D
d X
˛j .i /j
j D0
where d is the degree of the polynomial used. Models of this kind are called Almon lag models, polynomial distributed lag models, or PDLs for short. For example, Figure 18.62 shows the lag distribution that can be modeled with a low-order polynomial. Endpoint restrictions can be imposed on a PDL to require that the lag coefficients be 0 at the 0th lag, or at the final lag, or at both. Figure 18.62 Polynomial Distributed Lags
For linear single-equation models, SAS/ETS software includes the PDLREG procedure for estimating PDL models. See Chapter 20, “The PDLREG Procedure,” for a more detailed discussion of polynomial distributed lags and an explanation of endpoint restrictions.
Distributed Lag Models and the %PDL Macro F 1153
Polynomial and other distributed lag models can be estimated and simulated or forecast with PROC MODEL. For polynomial distributed lags, the %PDL macro can generate the needed programming statements automatically.
The %PDL Macro The SAS macro %PDL generates the programming statements to compute the lag coefficients of polynomial distributed lag models and to apply them to the lags of variables or expressions. To use the %PDL macro in a model program, you first call it to declare the lag distribution; later, you call it again to apply the PDL to a variable or expression. The first call generates a PARMS statement for the polynomial parameters and assignment statements to compute the lag coefficients. The second call generates an expression that applies the lag coefficients to the lags of the specified variable or expression. A PDL can be declared only once, but it can be used any number of times (that is, the second call can be repeated). The initial declaratory call has the general form %PDL ( pdlname, nlags, degree , R=code , OUTEST=dataset ) ;
where pdlname is a name (up to 32 characters) that you give to identify the PDL, nlags is the lag length, and degree is the degree of the polynomial for the distribution. The R=code is optional for endpoint restrictions. The value of code can be FIRST (for upper), LAST (for lower), or BOTH (for both upper and lower endpoints). See Chapter 20, “The PDLREG Procedure,” for a discussion of endpoint restrictions. The option OUTEST=dataset creates a data set that contains the estimates of the parameters and their covariance matrix. The later calls to apply the PDL have the general form %PDL( pdlname, expression )
where pdlname is the name of the PDL and expression is the variable or expression to which the PDL is to be applied. The pdlname given must be the same as the name used to declare the PDL. The following statements produce the output in Figure 18.63: proc model data=in list; parms int pz; %pdl(xpdl,5,2); y = int + pz * z + %pdl(xpdl,x); %ar(y,2,M=ULS); id i; fit y / out=model1 outresid converge=1e-6; run;
1154 F Chapter 18: The MODEL Procedure
Figure 18.63 %PDL Macro Estimates The MODEL Procedure Nonlinear OLS
Estimates
Term
Estimate
Approx Std Err
t Value
Approx Pr > |t|
XPDL_L0
1.568788
0.0935
16.77
0 If this differential equation is integrated too far in time, y exceeds the maximum value allowed on the computer, and the integration terminates. Likewise, differential systems that are singular cannot be solved or estimated in general. For example, consider the following differential system: x y
0 0
0
D
y C 2x C 4y C exp.t /
D
x C y C exp.4t /
0
This system has an analytical solution, but an accurate numerical solution is very difficult to obtain. 0 0 The reason is that y and x cannot be isolated on the left-hand side of the equation. If the equation is modified slightly to x y
0 0
D
0
y C 2x C 4y C exp.t / 0
D x C y C exp.4t /
the system is nonsingular, but the integration process could still fail or be extremely slow. If the MODEL procedure encounters either system, a warning message is issued. This system can be rewritten as the following recursive system, which can be estimated and simulated successfully with the MODEL procedure: x y
0 0
D 0:5y C 0:5exp.4t / C x C 1:5y
0:5exp.t/
0
D x C y C exp.4t /
Petzold (1982) mentions a class of differential algebraic equations that, when integrated numerically, could produce incorrect or misleading results. An example of such a system is 0
y2 .t / D y1 .t / C g1 .t / 0 D y2 .t / C g2 .t / The analytical solution to this system depends on g and its derivatives at the current time only and not on its initial value or past history. You should avoid systems of this and other similar forms mentioned in Petzold (1982).
1198 F Chapter 18: The MODEL Procedure
SOLVE Data Sets SDATA= Input Data Set The SDATA= option reads a cross-equation covariance matrix from a data set. The covariance matrix read from the SDATA= data set specified in the SOLVE statement is used to generate random equation errors when the RANDOM= option specifies Monte Carlo simulation. Typically, the SDATA= data set is created by the OUTS= option in a previous FIT statement. (The OUTS= data set from a FIT statement can be read back in by a SOLVE statement in the same PROC MODEL step.) You can create an input SDATA= data set by using the DATA step. PROC MODEL expects to find a character variable _NAME_ in the SDATA= data set as well as variables for the equations in the estimation or solution. For each observation with a _NAME_ value that matches the name of an equation, PROC MODEL fills the corresponding row of the S matrix with the values of the names of equations found in the data set. If a row or column is omitted from the data set, an identity matrix row or column is assumed. Missing values are ignored. Since the S matrix is symmetric, you can include only a triangular part of the S matrix in the SDATA= data set with the omitted part indicated by missing values. If the SDATA= data set contains multiple observations with the same _NAME_, the last values supplied for the _NAME_ variable are used. The section “OUTS= Data Set” on page 1162 contains more details on the format of this data set. Use the TYPE= option to specify the type of estimation method used to produce the S matrix you want to input.
ESTDATA= Input Data Set The ESTDATA= option specifies an input data set that contains an observation with values for some or all of the model parameters. It can also contain observations with the rows of a covariance matrix for the parameters. When the ESTDATA= option is used, parameter values are set from the first observation. If the RANDOM= option is used and the ESTDATA= data set contains a covariance matrix, the covariance matrix of the parameter estimates is read and used to generate pseudo-random shocks to the model parameters for Monte Carlo simulation. These random perturbations have a multivariate normal distribution with the covariance matrix read from the ESTDATA= data set. The ESTDATA= data set is usually created by the OUTEST= option in a FIT statement. The OUTEST= data set contains the parameter estimates produced by the FIT statement and also contains the estimated covariance of the parameter estimates if the OUTCOV option is used. This OUTEST= data set can be read in by the ESTDATA= option in a SOLVE statement. You can also create an ESTDATA= data set with a SAS DATA step program. The data set must contain a numeric variable for each parameter to be given a value or covariance column. The name of the variable in the ESTDATA= data set must match the name of the parameter in the model. Parameters with names longer than 32 characters cannot be set from an ESTDATA= data set. The
SOLVE Data Sets F 1199
data set must also contain a character variable _NAME_ of length 32. _NAME_ has a blank value for the observation that gives values to the parameters. _NAME_ contains the name of a parameter for observations that define rows of the covariance matrix. More than one set of parameter estimates and covariances can be stored in the ESTDATA= data set if the observations for the different estimates are identified by the variable _TYPE_. _TYPE_ must be a character variable of length eight. The TYPE= option is used to select for input the part of the ESTDATA= data set for which the value of the _TYPE_ variable matches the value of the TYPE= option.
OUT= Data Set The OUT= data set contains solution values, residual values, and actual values of the solution variables. The OUT= data set contains the following variables: BY variables RANGE variable ID variables _TYPE_, a character variable of length eight that identifies the type of observation. The _TYPE_ variable can be PREDICT, RESIDUAL, ACTUAL, or ERROR. _MODE_, a character variable of length eight that identifies the solution mode. _MODE_ takes the value FORECAST or SIMULATE. if lags are used, a numeric variable, _LAG_, that contains the number of dynamic lags that contribute to the solution. The value of _LAG_ is always zero for STATIC mode solutions. _LAG_ is set to a missing value for lag-starting observations. if the RANDOM= option is used, _REP_, a numeric variable that contains the replication number. For example, if RANDOM=10, each input observation results in eleven output observations with _REP_ values 0 through 10. The observations with _REP_=0 are from the unperturbed solution. (The random-number generator functions are suppressed, and the parameter and endogenous perturbations are zero when _REP_=0.) _ERRORS_, a numeric variable that contains the number of errors that occurred during the execution of the program for the last iteration for the observation. If the solution failed to converge, this is counted as one error, and the _ERRORS_ variable is made negative. solution and other variables. The solution variables contain solution or predicted values for _TYPE_=PREDICT observations, residuals for _TYPE_=RESIDUAL observations, or actual values for _TYPE_=ACTUAL observations. The other model variables, and any other variables read from the input data set, are always actual values from the input data set. any other variables named in the OUTVARS statement. These can be program variables computed by the model program, CONTROL variables, parameters, or special variables in
1200 F Chapter 18: The MODEL Procedure
the model program. Compound variable names longer than 32 characters are truncated in the OUT= data set. By default, only the predicted values are written to the OUT= data set. The OUTRESID, OUTACTUAL, and OUTERROR options are used to add the residual, actual, and ERROR. values, respectively, to the data set. For examples of the OUT= data set, see Example 18.6.
DATA= Input Data Set The input data set should contain all of the exogenous variables and should supply nonmissing values for them for each period to be solved. Solution variables can be supplied in the input data set and are used as follows: to supply initial lags. For example, if the lag length of the model is three, three observations are read in to feed the lags before any solutions are computed. to evaluate the goodness of fit. Goodness-of-fit measures are computed based on the difference between the solved values and the actual values supplied from the data set. to supply starting values for the iterative solution. If the value from the input data set for a solution variable is missing, the starting value for it is taken from the solution of the last period (if nonmissing) or else the solution estimate is started at zero. for STATIC mode solutions, actual values from the data set are used by the lagging functions for the solution variables. for FORECAST mode solutions, actual values from the data set are used as the solution values when nonmissing.
Programming Language Overview: MODEL Procedure
Variables in the Model Program Variable names are alphanumeric but must start with a letter. The length is limited to 32 characters. PROC MODEL uses several classes of variables, and different variable classes are treated differently. The variable class is controlled by declaration statements: the VAR, ENDOGENOUS, and EXOGENOUS statements for model variables, the PARAMETERS statement for parameters, and the CONTROL statement for control class variables. These declaration statements have several valid abbreviations. Various internal variables are also made available to the model program to allow
Variables in the Model Program F 1201
communication between the model program and the procedure. RANGE, ID, and BY variables are also available to the model program. Those variables not declared as any of the preceding classes are program variables. Some classes of variables can be lagged; that is, their value at each observation is remembered, and previous values can be referred to by the lagging functions. Other classes have only a single value and are not affected by lagging functions. For example, parameters have only one value and are not affected by lagging functions; therefore, if P is a parameter, DIFn (P) is always 0, and LAGn (P) is always the same as P for all values of n. The different variable classes and their roles in the model are described in the following.
Model Variables Model variables are declared by VAR, ENDOGENOUS, or EXOGENOUS statements, or by FIT and SOLVE statements. The model variables are the variables that the model is intended to explain or predict. PROC MODEL enables you to use expressions on the left-hand side of the equal sign to define model equations. For example, a log-linear model for Y can be written as log( y ) = a + b * x;
Previously, only a variable name was allowed on the left-hand side of the equal sign. The text on the left-hand side of the equation serves as the equation name used to identify the equation in printed output, in the OUT= data sets, and in FIT or SOLVE statements. To refer to equations specified by using left-hand side expressions (in the FIT statement, for example), place the left-hand side expression in quotes. For example, the following statements fit a log-linear model to the dependent variable Y: proc model data=in; log( y ) = a + b * x; fit "log(y)"; run;
The estimation and simulation is performed by transforming the models into general form equations. No actual or predicted value is available for general form equations, so no R2 or adjusted R2 is computed.
Equation Variables An equation variable is one of several special variables used by PROC MODEL to control the evaluation of model equations. An equation variable name consists of one of the prefixes EQ, RESID, ERROR, PRED, or ACTUAL, followed by a period and the name of a model equation.
1202 F Chapter 18: The MODEL Procedure
Equation variable names can appear in parts of the PROC MODEL printed output, and they can be used in the model program. For example, RESID-prefixed variables can be used in LAG functions to define equations with moving-average error terms. See the section “Autoregressive Moving-Average Error Processes” on page 1138 for details. The meaning of these prefixes is detailed in the section “Equation Translations” on page 1204.
Parameters Parameters are variables that have the same value for each observation. Parameters can be given values or can be estimated by fitting the model to data. During the SOLVE stage, parameters are treated as constants. If no estimation is performed, the SOLVE stage uses the initial value provided in the ESTDATA= data set, the MODEL= file, or in the PARAMETER statement, as the value of the parameter. The PARAMETERS statement declares the parameters of the model. Parameters are not lagged, and they cannot be changed by the model program.
Control Variables Control variables supply constant values to the model program that can be used to control the model in various ways. The CONTROL statement declares control variables and specifies their values. A control variable is like a parameter except that it has a fixed value and is not estimated from the data. Control variables are not reinitialized before each pass through the data and can thus be used to retain values between passes. You can use control variables to vary the program logic. Control variables are not affected by lagging functions. For example, if you have two versions of an equation for a variable Y, you could put both versions in the model and, by using a CONTROL statement to select one of them, produce two different solutions to explore the effect the choice of equation has on the model, as shown in the following statements: select (case); when (1) y = when (2) y = end;
...first version of equation... ; ...second version of equation... ;
control case 1; solve / out=case1; run; control case 2; solve / out=case2; run;
Variables in the Model Program F 1203
RANGE, ID, and BY Variables The RANGE statement controls the range of observations in the input data set that is processed by PROC MODEL. The ID statement lists variables in the input data set that are used to identify observations in the printout and in the output data set. The BY statement can be used to make PROC MODEL perform a separate analysis for each BY group. The variable in the RANGE statement, the ID variables, and the BY variables are available for the model program to examine, but their values should not be changed by the program. The BY variables are not affected by lagging functions.
Internal Variables You can use several internal variables in the model program to communicate with the procedure. For example, if you want PROC MODEL to list the values of all the variables when more than 10 iterations are performed and the procedure is past the 20th observation, you can write if _obs_ > 20 then if _iter_ > 10 then _list_ = 1;
Internal variables are not affected by lagging functions, and they cannot be changed by the model program except as noted. The following internal variables are available. The variables are all numeric except where noted. _ERRORS_
is a flag that is set to 0 at the start of program execution and is set to a nonzero value whenever an error occurs. The program can also set the _ERRORS_ variable.
_ITER_
is the iteration number. For FIT tasks, the value of _ITER_ is negative for preliminary grid-search passes. The iterative phase of the estimation starts with iteration 0. After the estimates have converged, a final pass is made to collect statistics with _ITER_ set to a missing value. Note that at least one pass, and perhaps several subiteration passes as well, is made for each iteration. For SOLVE tasks, _ITER_ counts the iterations used to compute the simultaneous solution of the system.
_LAG_
is the number of dynamic lags that contribute to the solution at the current observation. _LAG_ is always 0 for FIT tasks and for STATIC solutions. _LAG_ is set to a missing value during the lag starting phase.
_LIST_
is a list flag that is set to 0 at the start of program execution. The program can set _LIST_ to a nonzero value to request a listing of the values of all the variables in the program after the program has finished executing.
_METHOD_
is the solution method in use for SOLVE tasks. _METHOD_ is set to a blank value for FIT tasks. _METHOD_ is a character-valued variable. Values are NEWTON, JACOBI, SIEDEL, or ONEPASS.
_MODE_
takes the value ESTIMATE for FIT tasks and the value SIMULATE or FORECAST for SOLVE tasks. _MODE_ is a character-valued variable.
_NMISS_
is the number of missing or otherwise unusable observations during the model estimation. For FIT tasks, _NMISS_ is initially set to 0; at the start of each
1204 F Chapter 18: The MODEL Procedure
iteration, _NMISS_ is set to the number of unusable observations for the previous iteration. For SOLVE tasks, _NMISS_ is set to a missing value. _NUSED_
is the number of nonmissing observations used in the estimation. For FIT tasks, PROC MODEL initially sets _NUSED_ to the number of parameters; at the start of each iteration, _NUSED_ is reset to the number of observations used in the previous iteration. For SOLVE tasks, _NUSED_ is set to a missing value.
_OBS_
counts the observations being processed. _OBS_ is negative or 0 for observations in the lag starting phase.
_REP_
is the replication number for Monte Carlo simulation when the RANDOM= option is specified in the SOLVE statement. _REP_ is 0 when the RANDOM= option is not used and for FIT tasks. When _REP_=0, the random-number generator functions always return 0.
_WEIGHT_
is the weight of the observation. For FIT tasks, _WEIGHT_ provides a weight for the observation in the estimation. _WEIGHT_ is initialized to 1.0 at the start of execution for FIT tasks. For SOLVE tasks, _WEIGHT_ is ignored.
Program Variables Variables not in any of the other classes are called program variables. Program variables are used to hold intermediate results of calculations. Program variables are reinitialized to missing values before each observation is processed. Program variables can be lagged. The RETAIN statement can be used to give program variables initial values and enable them to keep their values between observations.
Character Variables PROC MODEL supports both numeric and character variables. Character variables are not involved in the model specification but can be used to label observations, to write debugging messages, or for documentation purposes. All variables are numeric unless they are the following. character variables in a DATA= SAS data set program variables assigned a character value declared to be character by a LENGTH or ATTRIB statement
Equation Translations Equations written in normalized form are always automatically converted to general form equations. For example, when a normalized form equation such as y = a + b*x;
Equation Translations F 1205
is encountered, it is translated into the equations PRED.y = a + b*x; RESID.y = PRED.y - ACTUAL.y; ERROR.y = PRED.y - y;
If the same system is expressed as the following general form equation, then this equation is used unchanged. EQ.y = y -
a + b*x;
This makes it easy to solve for arbitrary variables and to modify the error terms for autoregressive or moving average models. Use the LIST option to see how this transformation is performed. For example, the following statements produce the listing shown in Figure 18.84. proc model data=line list; y = a1 + b1*x1 + c1*x2; fit y; run;
Figure 18.84 LIST Output The MODEL Procedure
Stmt 1 1 1
Listing of Compiled Program Code Line:Col Statement as Parsed 3884:4 3884:4 3884:4
PRED.y = a1 + b1 * x1 + c1 * x2; RESID.y = PRED.y - ACTUAL.y; ERROR.y = PRED.y - y;
PRED.Y is the predicted value of Y, and ACTUAL.Y is the value of Y in the data set. The predicted value minus the actual value, RESID.Y, is then the error term, , for the original Y equation. Note that the residuals obtained from the OUTRESID option in the OUT=dataset for both the FIT and SOLVE statements are defined as act ual pred icted , the negative of RESID.Y. See the section “Syntax: MODEL Procedure” on page 1012 for details. ACTUAL.Y and Y have the same value for parameter estimation. For solve tasks, ACTUAL.Y is still the value of Y in the data set but Y becomes the solved value; the value that satisfies PRED.Y – Y = 0. The following are the equation variable definitions. EQ.
The value of an EQ.-prefixed equation variable (normally used to define a general form equation) represents the failure of the equation to hold. When the EQ.name variable is 0, the name equation is satisfied.
RESID.
The RESID.name variables represent the stochastic parts of the equations and are used to define the objective function for the estimation process. A RESID.-
1206 F Chapter 18: The MODEL Procedure
prefixed equation variable is like an EQ.-prefixed variable but makes it possible to use or transform the stochastic part of the equation. The RESID. equation is used in place of the ERROR. equation for model solutions if it has been reassigned or used in the equation. ERROR.
An ERROR.name variable is like an EQ.-prefixed variable, except that it is used only for model solution and does not affect parameter estimation.
PRED.
For a normalized form equation (specified by assignment to a model variable), the PRED.name equation variable holds the predicted value, where name is the name of both the model variable and the corresponding equation. (PRED.-prefixed variables are not created for general form equations.)
ACTUAL.
For a normalized form equation (specified by assignment to a model variable), the ACTUAL.name equation variable holds the value of the name model variable read from the input data set.
DERT.
The DERT.name variable defines a differential equation. Once defined, it might be used on the right-hand side of another equation.
H.
The H.name variable specifies the functional form for the variance of the named equation.
GMM_H.
This is created for H.vars and is the moment equation for the variance for GMM. This variable is used only for GMM. GMM_H.name = RESID.name**2 - H.name;
MSE.
The MSE.y variable contains the value of the mean squared error for y at each iteration. An MSE. variable is created for each dependent/endogenous variable in the model. These variables can be used to specify the missing lagged values in the estimation and simulation of GARCH type models. demret = intercept ; h.demret = arch0 + arch1 * xlag( resid.demret ** 2, mse.demret) + garch1 * xlag(h.demret, mse.demret) ;
NRESID.
This is created for H.vars and is the normalized residual of the variable . The formula is NRESID.name = RESID.name/ sqrt(H.name);
The three equation variable prefixes, RESID., ERROR., and EQ. allow for control over the objective function for the FIT, the SOLVE, or both the FIT and the SOLVE stages. For FIT tasks, PROC MODEL looks first for a RESID.name variable for each equation. If defined, the RESID.-prefixed equation variable is used to define the objective function for the parameter estimation process. Otherwise, PROC MODEL looks for an EQ.-prefixed variable for the equation and uses it instead. For SOLVE tasks, PROC MODEL looks first for an ERROR.name variable for each equation. If defined, the ERROR.-prefixed equation variable is used for the solution process. Otherwise, PROC MODEL looks for an EQ.-prefixed variable for the equation and uses it instead. To solve the simultaneous equation system, PROC MODEL computes values of the solution variables (the model variables being solved for) that make all of the ERROR.name and EQ.name variables close to 0.
Derivatives F 1207
Derivatives Nonlinear modeling techniques require the calculation of derivatives of certain variables with respect to other variables. The MODEL procedure includes an analytic differentiator that determines the model derivatives and generates program code to compute these derivatives. When parameters are estimated, the MODEL procedure takes the derivatives of the equation with respect to the parameters. When the model is solved, Newton’s method requires the derivatives of the equations with respect to the variables solved for. PROC MODEL uses exact mathematical formulas for derivatives of non-user-defined functions. For other functions, numerical derivatives are computed and used. The differentiator differentiates the entire model program, including the conditional logic and flow of control statements. Delayed definitions, as when the LAG of a program variable is referred to before the variable is assigned a value, are also differentiated correctly. The differentiator includes optimization features that produce efficient code for the calculation of derivatives. However, when flow of control statements such as GOTO statements are used, the optimization process is impeded, and less efficient code for derivatives might be produced. Optimization is also reduced by conditional statements, iterative DO loops, and multiple assignments to the same variable. The table of derivatives is printed with the LISTDER option. The code generated for the computation of the derivatives is printed with the LISTCODE option.
Derivative Variables When the differentiator needs to generate code to evaluate the expression for the derivative of a variable, the result is stored in a special derivative variable. Derivative variables are not created when the derivative expression reduces to a previously computed result, a variable, or a constant. The names of derivative variables, which might sometimes appear in the printed output, have the form @obj /@wrt, where obj is the variable whose derivative is being taken and wrt is the variable that the differentiation is with respect to. For example, the derivative variable for the derivative of Y with respect to X is named @Y/@X. The derivative variables can be accessed or used as part of the model program using the GETDER() function. GETDER(x, a ) GETDER(x, a, b )
the derivative of x with respect to a. the second derivative of x with respect to a and b.
The main purpose of the GETDER() function is for surfacing the derivatives so they can be stored in a data set for further processing. Only derivatives that are implied by the problem are available to the GETDER() function. When derivatives are requested that aren’t already created, a missing value will be returned. The derivative of the GETDER() function is always zero so the results of the GETDER() function shouldn’t be used in any of the equations in the FIT or the SOLVE statement.
1208 F Chapter 18: The MODEL Procedure
The following example adds the gradient of the PRED.y value with respect to the parameters to the OUT= data set. proc model data=line ; y = a1 + b1**2 *x1 + c1*x2; Dy_a1 = getder(PRED.y,a1); Dy_b1 = getder(PRED.y,b1); Dy_c1 = getder(PRED.y,c1); outvars Dy_a1 Dy_b1 Dy_c1; fit y / out=grad; run;
Mathematical Functions The following is a brief summary of SAS functions that are useful for defining models. Additional functions and details are in SAS Language: Reference. Information about creating new functions can be found in SAS/BASE Software: Procedure Reference, Chapter 18, “The FCMP Procedure.” ABS(x )
the absolute value of x
ARCOS(x )
the arccosine in radians of x; x should be between 1 and 1.
ARSIN(x )
the arcsine in radians of x; x should be between 1 and 1.
ATAN(x )
the arctangent in radians of x
COS(x )
the cosine of x; x is in radians.
COSH(x )
the hyperbolic cosine of x
EXP(x )
ex
LOG(x )
the natural logarithm of x
LOG10(x )
the log base ten of x
LOG2(x )
the log base two of x
SIN(x )
the sine of x; x is in radians.
SINH(x )
the hyperbolic sine of x
SQRT(x )
the square root of x
TAN(x )
the tangent of x; x is in radians and is not an odd multiple of =2.
TANH(x )
the hyperbolic tangent of x
Random-Number Functions The MODEL procedure provides several functions for generating random numbers for Monte Carlo simulation. These functions use the same generators as the corresponding SAS DATA step functions. The following random number functions are supported: RANBIN, RANCAU, RAND, RANEXP, RANGAM, RANNOR, RANPOI, RANTBL, RANTRI, and RANUNI. For more information, refer to SAS Language: Reference.
Functions across Time F 1209
Each reference to a random number function sets up a separate pseudo-random sequence. Note that this means that two calls to the same random function with the same seed produce identical results. This is different from the behavior of the random number functions used in the SAS DATA step. For example, the following statements produce identical values for X and Y, but Z is from an independent pseudo-random sequence: x=rannor(123); y=rannor(123); z=rannor(567); q=rand('BETA', 1, 12 );
For FIT tasks, all random number functions always return 0. For SOLVE tasks, when Monte Carlo simulation is requested, a random number function computes a new random number on the first iteration for an observation (if it is executed on that iteration) and returns that same value for all later iterations of that observation. When Monte Carlo simulation is not requested, random number functions always return 0.
Functions across Time PROC MODEL provides four types of special built-in functions that refer to the values of variables and expressions in previous time periods. These functions have the following forms where n represents the number of periods, x is any expression, and the argument i is a variable or expression that gives the lag length (0 x )
returns the ith lag of x, where n is the maximum lag, with missing lags replaced with zero
XLAGn ( x, y )
returns the nth lag of x if x is nonmissing, or y if x is missing
ZDIFn (x )
is the difference with lag length truncated and missing values converted to zero; x is the variable or expression to compute the moving average of
MOVAVGn( x )
is the moving average if Xt denotes the observation at time point t, to ensure compatibility with the number n of observations used to calculate the moving average MOVAVGn, the following definition is used: MOVAV Gn.Xt / D
Xt C Xt
1
C Xt
2
C : : : C Xt
nC1
n
The moving average calculation for SAS 9.1 and earlier releases is as follows: MOVAV Gn.Xt / D
Xt C Xt
1
C Xt 2 C : : : C Xt nC1
Missing values of x are omitted in computing the average.
n
1210 F Chapter 18: The MODEL Procedure
If you do not specify n, the number of periods is assumed to be one. For example, LAG(X) is the same as LAG1(X). No more than four digits can be used with a lagging function; that is, LAG9999 is the greatest LAG function, ZDIF9999 is the greatest ZDIF function, and so on. The LAG functions get values from previous observations and make them available to the program. For example, LAG(X) returns the value of the variable X as it was computed in the execution of the program for the preceding observation. The expression LAG2(X+2*Y) returns the value of the expression X+2*Y, computed by using the values of the variables X and Y that were computed by the execution of the program for the observation two periods ago. The DIF functions return the difference between the current value of a variable or expression and the value of its LAG. For example, DIF2(X) is a short way of writing X–LAG2(X), and DIF15(SQRT(2*Z)) is a short way of writing SQRT(2*Z)–LAG15(SQRT(2*Z)). The ZLAG and ZDIF functions are like the LAG and DIF functions, but they are not counted in the determination of the program lag length, and they replace missing values with 0s. The ZLAG function returns the lagged value if the lagged value is nonmissing, or 0 if the lagged value is missing. The ZDIF function returns the differenced value if the differenced value is nonmissing, or 0 if the value of the differenced value is missing. The ZLAG function is especially useful for models with ARMA error processes. See the next section for details.
Lag Logic The LAG and DIF lagging functions in the MODEL procedure are different from the queuing functions with the same names in the DATA step. Lags are determined by the final values that are set for the program variables by the execution of the model program for the observation. This can have upsetting consequences for programs that take lags of program variables that are given different values at various places in the program, as shown in the following statements: temp t temp s
= = = =
x + w; lag( temp ); q - r; lag( temp );
The expression LAG(TEMP) always refers to LAG(Q–R), never to LAG(X+W), since Q–R is the final value assigned to the variable TEMP by the model program. If LAG(X+W) is wanted for T, it should be computed as T=LAG(X+W) and not T=LAG(TEMP), as in the preceding example. Care should also be exercised in using the DIF functions with program variables that might be reassigned later in the program. For example, the program temp = x ; s = dif( temp ); temp = 3 * y;
computes values for S equivalent to s =
x
- lag( 3 * y );
Functions across Time F 1211
Note that in the preceding examples, TEMP is a program variable, not a model variable. If it were a model variable, the assignments to it would be changed to assignments to a corresponding equation variable. Note that whereas LAG1(LAG1(X)) is the same as LAG2(X), DIF1(DIF1(X)) is not the same as DIF2(X). The DIF2 function is the difference between the current period value at the point in the program where the function is executed and the final value at the end of execution two periods ago; DIF2 is not the second difference. In contrast, DIF1(DIF1(X)) is equal to DIF1(X)-LAG1(DIF1(X)), which equals X–2*LAG1(X)+LAG2(X), which is the second difference of X. More information about the differences between PROC MODEL and the DATA step LAG and DIF functions is found in Chapter 3, “Working with Time Series Data.”
Lag Lengths The lag length of the model program is the number of lags needed for any relevant equation. The program lag length controls the number of observations used to initialize the lags. PROC MODEL keeps track of the use of lags in the model program and automatically determines the lag length of each equation and of the model as a whole. PROC MODEL sets the program lag length to the maximum number of lags needed to compute any equation to be estimated, solved, or needed to compute any instrument variable used. In determining the lag length, the ZLAG and ZDIF functions are treated as always having a lag length of 0. For example, if Y is computed as y = lag2( x + zdif3( temp ) );
then Y has a lag length of 2 (regardless of how TEMP is defined). If Y is computed as y = zlag2( x + dif3( temp ) );
then Y has a lag length of 0. This is so that ARMA errors can be specified without causing the loss of additional observations to the lag starting phase and so that recursive lag specifications, such as moving-average error terms, can be used. Recursive lags are not permitted unless the ZLAG or ZDIF functions are used to truncate the lag length. For example, the following statement produces an error message: t = a + b * lag( t );
The program variable T depends recursively on its own lag, and the lag length of T is therefore undefined. In the following equation RESID.Y depends on the predicted value for the Y equation but the predicted value for the Y equation depends on the LAG of RESID.Y, and thus, the predicted value for the Y equation depends recursively on its own lag.
1212 F Chapter 18: The MODEL Procedure
y = yhat + ma * lag( resid.y );
The lag length is infinite, and PROC MODEL prints an error message and stops. Since this kind of specification is allowed, the recursion must be truncated at some point. The ZLAG and ZDIF functions do this. The following equation is valid and results in a lag length for the Y equation equal to the lag length of YHAT: y = yhat + ma * zlag( resid.y );
Initially, the lags of RESID.Y are missing, and the ZLAG function replaces the missing residuals with 0s, their unconditional expected values. The ZLAG0 function can be used to zero out the lag length of an expression. ZLAG0(x ) returns the current period value of the expression x, if nonmissing, or else returns 0, and prevents the lag length of x from contributing to the lag length of the current statement.
Initializing Lags At the start of each pass through the data set or BY group, the lag variables are set to missing values and an initialization is performed to fill the lags. During this phase, observations are read from the data set, and the model variables are given values from the data. If necessary, the model is executed to assign values to program variables that are used in lagging functions. The results for variables used in lag functions are saved. These observations are not included in the estimation or solution. If, during the execution of the program for the lag starting phase, a lag function refers to lags that are missing, the lag function returns missing. Execution errors that occur while starting the lags are not reported unless requested. The modeling system automatically determines whether the program needs to be executed during the lag starting phase. If L is the maximum lag length of any equation being fit or solved, then the first L observations are used to prime the lags. If a BY statement is used, the first L observations in the BY group are used to prime the lags. If a RANGE statement is used, the first L observations prior to the first observation requested in the RANGE statement are used to prime the lags. Therefore, there should be at least L observations in the data set. Initial values for the lags of model variables can also be supplied in VAR, ENDOGENOUS, and EXOGENOUS statements. This feature provides initial lags of solution variables for dynamic solution when initial values for the solution variable are not available in the input data set. For example, the statement var x 2 3 y 4 5 z 1;
feeds the initial lags exactly like these values in an input data set:
Language Differences F 1213
Lag 2 1
X 3 2
Y 5 4
Z . 1
If initial values for lags are available in the input data set and initial lag values are also given in a declaration statement, the values in the VAR, ENDOGENOUS, or EXOGENOUS statements take priority. The RANGE statement is used to control the range of observations in the input data set that are processed by PROC MODEL. In the following statement, ‘01jan1924’ specifies the starting period of the range, and ‘01dec1943’ specifies the ending period: range date = '01jan1924'd to '01dec1943'd;
The observations in the data set immediately prior to the start of the range are used to initialize the lags.
Language Differences For the most part, PROC MODEL programming statements work the same as they do in the DATA step as documented in SAS Language: Reference. However, there are several differences that should be noted.
DO Statement Differences The DO statement in PROC MODEL does not allow a character index variable. Thus, the following DO statement is not valid in PROC MODEL, although it is supported in the DATA step: do i = 'A', 'B', 'C';
/* invalid PROC MODEL code */
IF Statement Differences The IF statement in PROC MODEL does not allow a character-valued condition. For example, the following IF statement is not supported by PROC MODEL: if 'this' then
statement;
Comparisons of character values are supported in IF statements, so the following IF statement is acceptable: if 'this' < 'that' then
statement;
1214 F Chapter 18: The MODEL Procedure
PROC MODEL allows for embedded conditionals in expressions. For example the following two statements are equivalent: flag = if time = 1 or time = 2 then conc+30/5 + dose*time else if time > 5 then (0=1) else (patient * flag); if time = 1 or time = 2 then flag= conc+30/5 + dose*time; else if time > 5 then flag=(0=1); else flag=patient*flag;
Note that the ELSE operator involves only the first object or token after it so that the following assignments are not equivalent: total = if sum > 0 then sum else sum + reserve; total = if sum > 0 then sum else (sum + reserve);
The first assignment makes TOTAL always equal to SUM plus RESERVE.
PUT Statement Differences The PUT statement, mostly used in PROC MODEL for program debugging, supports only some of the features of the DATA step PUT statement. It also has some new features that the DATA step PUT statement does not support. The PROC MODEL PUT statement does not support line pointers, factored lists, iteration factors, overprinting, the _INFILE_ option, or the colon (:) format modifier. The PROC MODEL PUT statement does support expressions, but an expression must be enclosed in parentheses. For example, the following statement prints the square root of x: put (sqrt(x));
Subscripted array names must be enclosed in parentheses. For example, the following statement prints the ith element of the array A: put (a i);
However, the following statement is an error: put a i;
The PROC MODEL PUT statement supports the print item _PDV_ to print a formatted listing of all the variables in the program. For example, the following statement prints a much more readable listing of the variables than does the _ALL_ print item:
Language Differences F 1215
put _pdv_;
To print all the elements of the array A, use the following statement: put a;
To print all the elements of A with each value labeled by the name of the element variable, use the following statement: put a=;
ABORT Statement Difference In the MODEL procedure, the ABORT statement does not allow any arguments.
SELECT/WHEN/OTHERWISE Statement Differences The WHEN and OTHERWISE statements allow more than one target statement. That is, DO groups are not necessary for multiple statement WHENs. For example in PROC MODEL, the following syntax is valid: select; when(exp1) stmt1; stmt2; when(exp2) stmt3; stmt4; end;
The ARRAY Statement ARRAY arrayname < {dimensions} > < $ [length] > < variables and constants > ; ;
The ARRAY statement is used to associate a name with a list of variables and constants. The array name can then be used with subscripts in the model program to refer to the items in the list. In PROC MODEL, the ARRAY statement does not support all the features of the DATA step ARRAY statement. Implicit indexing cannot be used; all array references must have explicit subscript expressions. Only exact array dimensions are allowed; lower-bound specifications are not supported. A maximum of six dimensions is allowed. On the other hand, the ARRAY statement supported by PROC MODEL does allow both variables and constants to be used as array elements. You cannot make assignments to constant array elements. Both dimension specification and the list of elements are optional, but at least one must be supplied.
1216 F Chapter 18: The MODEL Procedure
When the list of elements is not given or fewer elements than the size of the array are listed, array variables are created by suffixing element numbers to the array name to complete the element list. The following are valid PROC MODEL array statements: array array array array array
x[120]; /* q[2,2]; /* b[4] va vb vc vd; /* x x1-x30; /* a[5] (1 2 3 4 5); /*
array X of length 120 Two dimensional array Q B[2] = VB, B[4] = VD array X of length 30, X[7] = X7 array A initialized to 1,2,3,4,5
*/ */ */ */ */
RETAIN Statement RETAIN variables initial-values ;
The RETAIN statement causes a program variable to hold its value from a previous observation until the variable is reassigned. The RETAIN statement can be used to initialize program variables. The RETAIN statement does not work for model variables, parameters, or control variables because the values of these variables are under the control of PROC MODEL and not programming statements. Use the PARMS and CONTROL statements to initialize parameters and control variables. Use the VAR, ENDOGENOUS, or EXOGENOUS statement to initialize model variables.
Storing Programs in Model Files Models can be saved in and recalled from SAS catalog files as well as XML-based data sets. SAS catalogs are special files that can store many kinds of data structures as separate units in one SAS file. Each separate unit is called an entry, and each entry has an entry type that identifies its structure to the SAS system. Starting with SAS 9.2, model files are being stored as SAS data sets instead of being stored as members of a SAS catalog as in earlier releases. This makes MODEL files more readily extendable in the future and enables Java-based applications to read the MODEL files directly. You can choose between the two formats by specifying a global CMPMODEL option in an OPTIONS statement. Details are given below. In general, to save a model, use the OUTMODEL=name option in the PROC MODEL statement, where name is specified as libref.catalog.entry, libref.entry, or entry for catalog entry and, starting with SAS 9.2, libref.datasetname or datasetname for XML-based SAS datasets. The libref, catalog, datasetnames and entry names must be valid SAS names no more than 32 characters long. The catalog name is restricted to seven characters on the CMS operating system. If not given, the catalog name defaults to MODELS, and the libref defaults to WORK. The entry type is always MODEL. Thus, OUTMODEL=X writes the model to the file WORK.MODELS.X.MODEL in the SAS catalog or creates a WORK.X XML-based dataset in the WORK library depending on the format chosen by using the CMPMODEL= option. By default, both these formats are chosen. The CMPMODEL= option can be used in an OPTIONS statement to modify the behavior when reading and writing MODEL files. The values allowed are CMPMODEL= BOTH | XML | CATALOG. For example, the following statements restore the previous behavior:
Diagnostics and Debugging F 1217
options cmpmodel=catalog;
The CMPMODEL= option defaults to BOTH in SAS 9.2 and is intended for transitional use. If CMPMODEL=BOTH, the MODEL procedure writes both formats; when loading model files PROC MODEL attempts to load the XML version first and the CATALOG version second (if the XML version is not found). If CMPMODEL=XML, the MODEL procedure reads and writes only the XML format. If CMPMODEL=CATALOG, only the catalog format is used. The MODEL= option is used to read in a model. A list of model files can be specified in the MODEL= option, and a range of names with numeric suffixes can be given, as in MODEL=(MODEL1– MODEL10). When more than one model file is given, the list must be placed in parentheses, as in MODEL=(A B C), except in case of a single name. If more than one model file is specified, the files are combined in the order listed in the MODEL= option. The MODEL procedure continues to read and write catalog MODEL files, and model files created by previous releases of SAS/ETS continue to work, so you should experience no direct impact from this change. When the MODEL= option is specified in the PROC MODEL statement and model definition statements are also given later in the PROC MODEL step, the model files are read in first, in the order listed, and the model program specified in the PROC MODEL step is appended after the model program read from the MODEL= files. The class assigned to a variable, when multiple model files are used, is the last declaration of that variable. For example, if Y1 was declared endogenous in the model file M1 and exogenous in the model file M2, the following statement causes Y1 to be declared exogenous. proc model model=(m1 m2);
The INCLUDE statement can be used to append model code to the current model code. In contrast, when the MODEL= option is used in the RESET statement, the current model is deleted before the new model is read. By default, no model file is output if the PROC MODEL step performs any FIT or SOLVE tasks, or if the MODEL= option or the NOSTORE option is used. However, to ensure compatibility with previous versions of SAS/ETS software, when the PROC MODEL step does nothing but compile the model program, no input model file is read, and the NOSTORE option is not used, a model file is written. This model file is the default input file for a later PROC SYSLIN or PROC SIMLIN step. The default output model filename in this case is WORK.MODELS._MODEL_.MODEL. If FIT statements are used to estimate model parameters, the parameter estimates written to the output model file are the estimates from the last estimation performed for each parameter.
Diagnostics and Debugging PROC MODEL provides several features to aid in finding errors in the model program. These debugging features are not usually needed; most models can be developed without them.
1218 F Chapter 18: The MODEL Procedure
The example model program that follows is used in the following sections to illustrate the diagnostic and debugging capabilities. This example is the estimation of a segmented model. /*--- Diagnostics and Debugging ---*/ *---------Fitting a Segmented Model using MODEL----* | | | | y | quadratic plateau | | | y=a+b*x+c*x*x y=p | | | ..................... | | | . : | | | . : | | | . : | | | . : | | | . : | | +-----------------------------------------X | | x0 | | | | continuity restriction: p=a+b*x0+c*x0**2 | | smoothness restriction: 0=b+2*c*x0 so x0=-b/(2*c)| *--------------------------------------------------*; title 'QUADRATIC MODEL WITH PLATEAU'; data a; input y x @@; datalines; .46 1 .47 2 .57 3 .61 4 .62 5 .68 6 .69 7 .78 8 .70 9 .74 10 .77 11 .78 12 .74 13 .80 13 .80 15 .78 16 ; proc model data=a list xref listcode; parms a 0.45 b 0.5 c -0.0025; x0 = -.5*b / c; /* join point */ if x < x0 then /* Quadratic part of model */ y = a + b*x + c*x*x; else /* Plateau part of model */ y = a + b*x0 + c*x0*x0; fit y; run;
Program Listing The LIST option produces a listing of the model program. The statements are printed one per line with the original line number and column position of the statement. The program listing from the example program is shown in Figure 18.85.
Diagnostics and Debugging F 1219
Figure 18.85 LIST Output for Segmented Model QUADRATIC MODEL WITH PLATEAU The MODEL Procedure
Stmt 1 2 3 3 3 4 5 5 5
Listing of Compiled Program Code Line:Col Statement as Parsed 3930:4 3931:4 3932:7 3932:7 3932:7 3933:4 3934:7 3934:7 3934:7
x0 = (-0.5 * b) / c; if x < x0 then PRED.y = a + b * x + c * x * x; RESID.y = PRED.y - ACTUAL.y; ERROR.y = PRED.y - y; else PRED.y = a + b * x0 + c * x0 * x0; RESID.y = PRED.y - ACTUAL.y; ERROR.y = PRED.y - y;
The LIST option also shows the model translations that PROC MODEL performs. LIST output is useful for understanding the code generated by the %AR and the %MA macros.
Cross-Reference The XREF option produces a cross-reference listing of the variables in the model program. The XREF listing is usually used in conjunction with the LIST option. The XREF listing does not include derivative (@-prefixed) variables. The XREF listing does not include generated assignments to equation variables, PRED., RESID., and ERROR.-prefixed variables, unless the DETAILS option is used. The cross-reference from the example program is shown in Figure 18.86. Figure 18.86 XREF Output for Segmented Model QUADRATIC MODEL WITH PLATEAU The MODEL Procedure
Symbol-----------
Cross Reference Listing For Program Kind Type References (statement)/(line):(col)
a b c x0
Var Var Var Var
Num Num Num Num
x
Var
Num
PRED.y
Var
Num
Used: 3/54587:13 5/54589:13 Used: 1/54585:12 3/54587:16 5/54589:16 Used: 1/54585:15 3/54587:22 5/54589:23 Assigned: 1/54585:15 Used: 2/54586:11 5/54589:16 5/54589:23 5/54589:26 Used: 2/54586:11 3/54587:16 3/54587:22 3/54587:24 Assigned: 3/54587:19 5/54589:20
1220 F Chapter 18: The MODEL Procedure
Compiler Listing The LISTCODE option lists the model code and derivatives tables produced by the compiler. This listing is useful only for debugging and should not normally be needed. LISTCODE prints the operator and operands of each operation generated by the compiler for each model program statement. Many of the operands are temporary variables generated by the compiler and given names such as #temp1. When derivatives are taken, the code listing includes the operations generated for the derivatives calculations. The derivatives tables are also listed. A LISTCODE option prints the transformed equations from the example shown in Figure 18.87 and Figure 18.88. Figure 18.87 LISTCODE Output for Segmented Model—Statements as Parsed Derivatives WRT-Variable
ObjectVariable
Derivative-Variable
a b c
RESID.y RESID.y RESID.y
@RESID.y/@a @RESID.y/@b @RESID.y/@c
Stmt
Listing of Compiled Program Code Line:Col Statement as Parsed
1 1 1 2 3 3 3 3 3 3 3 3 3 4 5 5 5
3930:4 3930:4 3930:4 3931:4 3932:7 3932:7 3932:7 3932:7 3932:7 3932:7 3932:7 3932:7 3932:7 3933:4 3934:7 3934:7 3934:7
5
3934:7
5 5 5 5 5
3934:7 3934:7 3934:7 3934:7 3934:7
x0 = (-0.5 * b) / c; @x0/@b = -0.5 / c; @x0/@c = - x0 / c; if x < x0 then PRED.y = a + b * x + c * x * x; @PRED.y/@a = 1; @PRED.y/@b = x; @PRED.y/@c = x * x; RESID.y = PRED.y - ACTUAL.y; @RESID.y/@a = @PRED.y/@a; @RESID.y/@b = @PRED.y/@b; @RESID.y/@c = @PRED.y/@c; ERROR.y = PRED.y - y; else PRED.y = a + b * x0 + c * x0 * x0; @PRED.y/@a = 1; @PRED.y/@b = x0 + b * @x0/@b + (c * @x0/@b * x0 + c * x0 * @x0/@b); @PRED.y/@c = b * @x0/@c + ((x0 + c * @x0/@c) * x0 + c * x0 * @x0/@c); RESID.y = PRED.y - ACTUAL.y; @RESID.y/@a = @PRED.y/@a; @RESID.y/@b = @PRED.y/@b; @RESID.y/@c = @PRED.y/@c; ERROR.y = PRED.y - y;
Diagnostics and Debugging F 1221
Figure 18.88 LISTCODE Output for Segmented Model—Compiled Code 1 Stmt ASSIGN
Oper Oper Oper Oper Oper Oper
* / eeocf / /
2 Stmt IF
Oper < 3 Stmt ASSIGN
Oper Oper Oper Oper Oper Oper Oper Oper Oper Oper
* + * * + eeocf = = * =
3 Stmt Assign
Oper Oper Oper Oper Oper
eeocf = = =
3 Stmt Assign
Oper 4 Stmt ELSE
line 3930 column 4. (1) arg=x0 argsave=x0 Source Text: at 3930:12 (30,0,2). at 3930:15 (31,0,2). at 3930:15 (18,0,1). at 3930:15 (31,0,2). at 3930:15 (24,0,1). at 3930:15 (31,0,2). line 3931 column 4. (2) arg=_temp1 argsave=_temp1 Source Text: at 3931:11 (36,0,2). line 3932 column 7. (1) arg=PRED.y argsave=y Source Text: at at at at at at at at at at
3932:16 3932:13 3932:22 3932:24 3932:19 3932:19 3932:19 3932:19 3932:24 3932:19
(30,0,2). (32,0,2). (30,0,2). (30,0,2). (32,0,2). (18,0,1). (1,0,1). (1,0,1). (30,0,2). (1,0,1).
x0 = -.5*b / c; * : _temp1 |t|
ge_int ge_f
-26.839 0.038226
32.0908 0.0150
-0.84 2.54
0.4153 0.0217
ge_c
0.137099
0.0352
3.90
0.0013
wh_int wh_f
3.680835 0.049156
9.5448 0.0172
0.39 2.85
0.7048 0.0115
wh_c
0.067271
0.0708
0.95
0.3559
gei_m1
-0.87615
0.1614
-5.43
|t|
-25.002 0.03712
34.2933 0.0161
-0.73 2.30
0.4765 0.0351
ge_c
0.137788
0.0380
3.63
0.0023
wh_int wh_f
2.946761 0.050395
9.5638 0.0174
0.31 2.89
0.7620 0.0106
wh_c
0.066531
0.0729
0.91
0.3749
gei_m1
-0.78516
0.1942
-4.04
0.0009
whi_m1
-0.69389
0.2540
-2.73
0.0148
Parameter ge_int ge_f
Label GE Intercept GE Lagged Share Value Coef GE Lagged Capital Stock Coef WH Intercept WH Lagged Share Value Coef WH Lagged Capital Stock Coef MA(gei) gei lag1 parameter MA(whi) whi lag1 parameter
Output 18.4.4 PROC ARIMA Results by Using ML Estimation Example of MA(1) Error Process Using Grunfeld's Model PROC ARIMA Using Maximum Likelihood The ARIMA Procedure Maximum Likelihood Estimation
Parameter
Estimate
Standard Error
t Value
Approx Pr > |t|
Lag
MU MA1,1 NUM1 NUM2
2.95645 -0.69305 0.05036 0.06672
9.20752 0.25307 0.01686 0.06939
0.32 -2.74 2.99 0.96
0.7481 0.0062 0.0028 0.3363
0 1 0 0
Variable whi whi whf whc
Shift 0 0 0 0
Example 18.5: Polynomial Distributed Lags by Using %PDL F 1245
Output 18.4.4 continued Constant Estimate Variance Estimate Std Error Estimate AIC SBC Number of Residuals
2.956449 81.29645 9.016455 148.9113 152.8942 20
Example 18.5: Polynomial Distributed Lags by Using %PDL This example shows the use of the %PDL macro for polynomial distributed lag models. Simulated data is generated so that Y is a linear function of six lags of X, with the lag coefficients following a quadratic polynomial. The model is estimated by using a fourth-degree polynomial, both with and without endpoint constraints. The example uses simulated data generated from the following model: yt D 10 C
6 X
f .z/xt
z
C
zD0
f .z/ D
5z 2 C 1:5z
The LIST option prints the model statements added by the %PDL macro. The following statements generate simulated data as shown: /*--------------------------------------------------------------*/ /* Generate Simulated Data for a Linear Model with a PDL on X */ /* y = 10 + x(6,2) + e */ /* pdl(x) = -5.*(lg)**2 + 1.5*(lg) + 0. */ /*--------------------------------------------------------------*/ data pdl; pdl2=-5.; pdl1=1.5; pdl0=0; array zz(i) z0-z6; do i=1 to 7; z=i-1; zz=pdl2*z**2 + pdl1*z + pdl0; end; do n=-11 to 30; x =10*ranuni(1234567)-5; pdl=z0*x + z1*xl1 + z2*xl2 + z3*xl3 + z4*xl4 + z5*xl5 + z6*xl6; e =10*rannor(1234567); y =10+pdl+e; if n>=1 then output; xl6=xl5; xl5=xl4; xl4=xl3; xl3=xl2; xl2=xl1; xl1=x; end; run; title1 'Polynomial Distributed Lag Example'; title3 'Estimation of PDL(6,4) Model-- No Endpoint Restrictions';
1246 F Chapter 18: The MODEL Procedure
proc model data=pdl; parms int; %pdl( xpdl, 6, 4 ) y = int + %pdl( xpdl, x ); fit y / list; run;
/* /* /* /*
declare the intercept parameter */ declare the lag distribution */ define the model equation */ estimate the parameters */
The LIST output for the model without endpoint restrictions is shown in Output 18.5.1. The first seven statements in the generated program are the polynomial expressions for lag parameters XPDL_L0 through XPDL_L6. The estimated parameters are INT, XPDL_0, XPDL_1, XPDL_2, XPDL_3, and XPDL_4. Output 18.5.1 PROC MODEL Listing of Generated Program Polynomial Distributed Lag Example Estimation of PDL(6,4) Model-- No Endpoint Restrictions The MODEL Procedure
Stmt
Listing of Compiled Program Code Line:Col Statement as Parsed
1 2
4370:14 4370:14
3
4370:14
4
4370:14
5
4370:14
6
4370:14
7
4370:14
8
4371:4
8 8 9
4371:4 4371:4 4370:15
10 11 12 13 14 15 16
4370:15 4370:15 4370:15 4370:15 4370:15 4370:15 4370:14
XPDL_L0 = XPDL_0; XPDL_L1 = XPDL_0 + XPDL_1 + XPDL_2 + XPDL_3 + XPDL_4; XPDL_L2 = XPDL_0 + XPDL_1 * 2 + XPDL_2 * 2 ** 2 + XPDL_3 * 2 ** 3 + XPDL_4 * 2 ** 4; XPDL_L3 = XPDL_0 + XPDL_1 * 3 + XPDL_2 * 3 ** 2 + XPDL_3 * 3 ** 3 + XPDL_4 * 3 ** 4; XPDL_L4 = XPDL_0 + XPDL_1 * 4 + XPDL_2 * 4 ** 2 + XPDL_3 * 4 ** 3 + XPDL_4 * 4 ** 4; XPDL_L5 = XPDL_0 + XPDL_1 * 5 + XPDL_2 * 5 ** 2 + XPDL_3 * 5 ** 3 + XPDL_4 * 5 ** 4; XPDL_L6 = XPDL_0 + XPDL_1 * 6 + XPDL_2 * 6 ** 2 + XPDL_3 * 6 ** 3 + XPDL_4 * 6 ** 4; PRED.y = int + XPDL_L0 * x + XPDL_L1 * LAG1( x ) + XPDL_L2 * LAG2( x ) + XPDL_L3 * LAG3( x ) + XPDL_L4 * LAG4( x ) + XPDL_L5 * LAG5( x ) + XPDL_L6 * LAG6( x ); RESID.y = PRED.y - ACTUAL.y; ERROR.y = PRED.y - y; ESTIMATE XPDL_L0, XPDL_L1, XPDL_L2, XPDL_L3, XPDL_L4, XPDL_L5, XPDL_L6; _est0 = XPDL_L0; _est1 = XPDL_L1; _est2 = XPDL_L2; _est3 = XPDL_L3; _est4 = XPDL_L4; _est5 = XPDL_L5; _est6 = XPDL_L6;
Example 18.5: Polynomial Distributed Lags by Using %PDL F 1247
The FIT results for the model without endpoint restrictions are shown in Output 18.5.2. Output 18.5.2 PROC MODEL Results That Specify No Endpoint Restrictions Polynomial Distributed Lag Example Estimation of PDL(6,4) Model-- No Endpoint Restrictions The MODEL Procedure Nonlinear OLS Summary of Residual Errors
Equation y
DF Model
DF Error
SSE
MSE
Root MSE
R-Square
Adj R-Sq
6
18
2070.8
115.0
10.7259
0.9998
0.9998
Nonlinear OLS Parameter Estimates
Parameter
Estimate
Approx Std Err
t Value
Approx Pr > |t|
int XPDL_0
9.621969 0.084374
2.3238 0.7587
4.14 0.11
0.0006 0.9127
XPDL_1
0.749956
2.0936
0.36
0.7244
XPDL_2
-4.196
1.6215
-2.59
0.0186
XPDL_3
-0.21489
0.4253
-0.51
0.6195
XPDL_4
0.016133
0.0353
0.46
0.6528
Label
PDL(XPDL,6,4) parameter for PDL(XPDL,6,4) parameter for PDL(XPDL,6,4) parameter for PDL(XPDL,6,4) parameter for PDL(XPDL,6,4) parameter for
(L)**0 (L)**1 (L)**2 (L)**3 (L)**4
Portions of the output produced by the following PDL model with endpoints of the model restricted to zero are presented in Output 18.5.3. title3 'Estimation of PDL(6,4) Model-- Both Endpoint Restrictions'; proc model data=pdl ; parms int; %pdl( xpdl, 6, 4, r=both ) y = int + %pdl( xpdl, x ); fit y /list; run;
/* /* /* /*
declare the intercept parameter */ declare the lag distribution */ define the model equation */ estimate the parameters */
1248 F Chapter 18: The MODEL Procedure
Output 18.5.3 PROC MODEL Results Specifying Both Endpoint Restrictions Polynomial Distributed Lag Example Estimation of PDL(6,4) Model-- Both Endpoint Restrictions The MODEL Procedure Nonlinear OLS Summary of Residual Errors
Equation y
DF Model
DF Error
SSE
MSE
Root MSE
R-Square
Adj R-Sq
4
20
449868
22493.4
150.0
0.9596
0.9535
Nonlinear OLS Parameter Estimates
Parameter
Estimate
Approx Std Err
t Value
Approx Pr > |t|
int XPDL_2
17.08581 13.88433
32.4032 5.4361
0.53 2.55
0.6038 0.0189
XPDL_3
-9.3535
1.7602
-5.31
|t|
CS1
1
0.945589
0.4579
2.06
0.0416
CS2
1
2.475449
0.4582
5.40
|t| 0.5954 0.0130
specifies that the endogenous variables in this statement be censored. Valid censored-options are as follows: LB=value or variable LOWERBOUND=value or variable
specifies the lower bound of the censored variables. If value is missing or the value in variable is missing, no lower bound is set. By default, no lower bound is set. UB=value or variable UPPERBOUND=value or variable
specifies the upper bound of the censored variables. If value is missing or the value in variable is missing, no upper bound is set. By default, no upper bound is set.
Truncated Variable Options TRUNCATED < (truncated-options ) >
specifies that the endogenous variables in this statement be truncated. Valid truncated-options are as follows: LB=value or variable LOWERBOUND=value or variable
specifies the lower bound of the truncated variables. If value is missing or the value in variable is missing, no lower bound is set. By default, no lower bound is set. UB=value or variable UPPERBOUND=value or variable
specifies the upper bound of the truncated variables. If value is missing or the value in variable is missing, no upper bound is set. By default, no upper bound is set.
Stochastic Frontier Variable Options FRONTIER < (frontier-options ) >
specifies that the endogenous variable in this statement follow a production or cost frontier. Valid frontier-options are as follows: TYPE=
HALF EXPONENTIAL TRUNCATED
specifies half-normal model. specifies exponential model. specifies truncated normal model.
1436 F Chapter 21: The QLIM Procedure
PRODUCTION
specifies that the model estimated be a production function. COST
specifies that the model estimated be a cost function. If neither PRODUCTION nor COST option is specified, production function is estimated by default.
Selection Options SELECT (select-option )
specifies selection criteria for sample selection model. Select-option specifies the condition for the endogenous variable to be selected. It is written as a variable name, followed by an equality operator (=) or an inequality operator (, =), followed by a number: variable operator number
The variable is the endogenous variable that the selection is based on. The operator can be =, , =. Multiple select-options can be combined with the logic operators: AND, OR. The following example illustrates the use of the SELECT option: endogenous y1 ~ select(z=0); endogenous y2 ~ select(z=1 or z=2);
The SELECT option can be used together with the DISCRETE, CENSORED, or TRUNCATED option. For example: endogenous y1 ~ select(z=0) discrete; endogenous y2 ~ select(z=1) censored (lb=0); endogenous y3 ~ select(z=1 or z=2) truncated (ub=10);
For more details about selection models with censoring or truncation, see the section “Selection Models” on page 1455.
FREQ Statement FREQ variable ;
The FREQ statement identifies a variable that contains the frequency of occurrence of each observation. PROC QLIM treats each observation as if it appears n times, where n is the value of the FREQ variable for the observation. If it is not an integer, the frequency value is truncated to an integer. If the frequency value is less than 1 or missing, the observation is not used in the model fitting. When the FREQ statement is not specified, each observation is assigned a frequency of 1. If you specify more than one FREQ statement, then the first FREQ statement is used.
HETERO Statement F 1437
HETERO Statement HETERO dependent variables exogenous variables < / options > ;
The HETERO statement specifies variables that are related to the heteroscedasticity of the residuals and the way these variables are used to model the error variance. The heteroscedastic regression model supported by PROC QLIM is yi D x0i ˇ C i i N.0; i2 / See the section “Heteroscedasticity” on page 1452 for more details on the specification of functional forms. LINK=value
The functional form can be specified using the LINK= option. The following option values are allowed: EXP
specifies the exponential link function 0
i2 D 2 .1 C exp.zi // LINEAR
specifies the linear link function 0
i2 D 2 .1 C zi / When the LINK= option is not specified, the exponential link function is specified by default. NOCONST
specifies that there be no constant in the linear or exponential heteroscedasticity model. 0
i2 D 2 .zi / 0
i2 D 2 exp.zi / SQUARE
estimates the model by using the square of linear heteroscedasticity function. For example, you can specify the following heteroscedasticity function: 0
i2 D 2 .1 C .zi /2 / model y = x1 x2 / discrete; hetero y ~ z1 / link=linear square;
The option SQUARE does not apply to exponential heteroscedasticity function because the 0 0 square of an exponential function of zi is the same as the exponential of 2zi . Hence the only difference is that all estimates are divided by two.
1438 F Chapter 21: The QLIM Procedure
INIT Statement INIT initvalue1 < , initvalue2 . . . > ;
The INIT statement is used to set initial values for parameters in the optimization. Any number of INIT statements can be specified. Each initvalue is written as a parameter or parameter list, followed by an optional equality operator (=), followed by a number: parameter number
MODEL Statement MODEL dependent = regressors < / options > ;
The MODEL statement specifies the dependent variable and independent regressor variables for the regression model. The following options can be used in the MODEL statement after a slash (/). LIMIT1=value
specifies the restriction of the threshold value of the first category when the ordinal probit or logit model is estimated. LIMIT1=ZERO is the default option. When LIMIT1=VARYING is specified, the threshold value is estimated. NOINT
suppresses the intercept parameter.
Endogenous Variable Options The endogenous variable options are the same as the options specified in the ENDOGENOUS statement. If an endogenous variable has an endogenous option specified in both the MODEL statement and the ENDOGENOUS statement, the option in the ENDOGENOUS statement is used.
BOXCOX Estimation Options BOXCOX (option-list )
specifies options that are used for Box-Cox regression or regressor transformation. For example, the Box-Cox regression is specified as model y = x1 x2 / boxcox(y=lambda,x1 x2)
NLOPTIONS Statement F 1439
PROC QLIM estimates the following Box-Cox regression model: ./
yi
. /
. /
D ˇ0 C ˇ1 x1i 2 C ˇ2 x2i 2 C i
The option-list takes the form variable-list < = varname > separated by ’,’. The variable-list specifies that the list of variables have the same Box-Cox transformation; varname specifies the name of this Box-Cox coefficient. If varname is not specified, the coefficient is called _Lambdai, where i increments sequentially.
NLOPTIONS Statement NLOPTIONS < options > ;
PROC QLIM uses the nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks. For a list of all the options of the NLOPTIONS statement, see Chapter 6, “Nonlinear Optimization Methods.”
OUTPUT Statement OUTPUT < OUT=SAS-data-set > < output-options > ;
The OUTPUT statement creates a new SAS data set containing all variables in the input data set and, optionally, the estimates of x0 ˇ, predicted value, residual, marginal effects, probability, standard deviation of the error, expected value, conditional expected value, technical efficiency measures, and inverse Mills ratio. When the response values are missing for the observation, all output estimates except residual are still computed as long as none of the explanatory variables is missing. This enables you to compute these statistics for prediction. You can specify only one OUTPUT statement. Details on the specifications in the OUTPUT statement are as follows: CONDITIONAL
outputs estimates of conditional expected values of continuous endogenous variables. ERRSTD
outputs estimates of j , the standard deviation of the error term. EXPECTED
outputs estimates of expected values of continuous endogenous variables. MARGINAL
outputs marginal effects. MILLS
outputs estimates of inverse Mills ratios of censored or truncated continuous, binary discrete, and selection endogenous variables.
1440 F Chapter 21: The QLIM Procedure
OUT=SAS-data-set
names the output data set. PREDICTED
outputs estimates of predicted endogenous variables. PROB
outputs estimates of probability of discrete endogenous variables taking the current observed responses. PROBALL
outputs estimates of probability of discrete endogenous variables for all possible responses. RESIDUAL
outputs estimates of residuals of continuous endogenous variables. XBETA
outputs estimates of x0 ˇ. TE1
outputs estimates of technical efficiency for each producer in the stochastic frontier model suggested by Battese and Coelli (1988). TE2
outputs estimates of technical efficiency for each producer in the stochastic frontier model suggested by Jondrow et al. (1982).
RESTRICT Statement RESTRICT restriction1 < , restriction2 . . . > ;
The RESTRICT statement is used to impose linear restrictions on the parameter estimates. Any number of RESTRICT statements can be specified, but the number of restrictions imposed is limited by the number of regressors. Each restriction is written as an expression, followed by an equality operator (=) or an inequality operator (, =), followed by a second expression: expression operator expression
The operator can be =, , =. The operator and second expression are optional. Restriction expressions can be composed of parameter names, multiplication (), addition (C) and substitution ( ) operators, and constants. Parameters named in restriction expressions must be among the parameters estimated by the model. Parameters associated with a regressor variable are referred to by the name of the corresponding regressor variable. The restriction expressions must be a linear function of the parameters. The following is an example of the use of the RESTRICT statement:
TEST Statement F 1441
proc qlim data=one; model y = x1-x10 / discrete; restrict x1*2 ;
The WEIGHT statement specifies a variable to supply weighting values to use for each observation in estimating parameters. The log likelihood for each observation is multiplied by the corresponding weight variable value. If the weight of an observation is nonpositive, that observation is not used in the estimation. The following option can be added to the WEIGHT statement after a slash (/).
Details: QLIM Procedure F 1443
NONORMALIZE
specifies that the weights are required to be used as is. When this option is not specified, the weights are normalized so that they add up to the actual sample size. Weights wi are normalized by multiplying them by Pn n w , where n is the sample size. iD1
i
Details: QLIM Procedure
Ordinal Discrete Choice Modeling Binary Probit and Logit Model The binary choice model is yi D x0i ˇ C i where value of the latent dependent variable, yi , is observed only as follows: yi
D 1 if yi > 0 D 0 otherwise
The disturbance, i , of the probit model has standard normal distribution with the distribution function (CDF) Z x 1 ˆ.x/ D p exp. t 2 =2/dt 2 1 The disturbance of the logit model has standard logistic distribution with the CDF ƒ.x/ D
1 exp.x/ D 1 C exp.x/ 1 C exp. x/
The binary discrete choice model has the following probability that the event fyi D 1g occurs: ˆ.x0i ˇ/ .probit/ 0 P .yi D 1/ D F .xi ˇ/ D ƒ.x0i ˇ/ .logit/ The log-likelihood function is N X ˚ `D yi logŒF .x0i ˇ/ C .1
yi / logŒ1
F .x0i ˇ/
i D1
where the CDF F .x/ is defined as ˆ.x/ for the probit model while F .x/ D ƒ.x/ for logit. The first order derivative of the logit model are N
X @` D .yi @ˇ i D1
ƒ.x0i ˇ//xi
1444 F Chapter 21: The QLIM Procedure
The probit model has more complicated derivatives N N X X .2yi 1/.x0i ˇ/ @` x D D ri xi i @ˇ ˆ.x0i ˇ/ i D1
i D1
where ri D
.2yi
1/.x0i ˇ/ ˆ.x0i ˇ/
Note that the logit maximum likelihood estimates are p times greater than probit maximum 3 likelihood estimates, since the probit parameter estimates, ˇ, are standardized, and the error term 2 with logistic distribution has a variance of 3 .
Ordinal Probit/Logit When the dependent variable is observed in sequence with M categories, binary discrete choice modeling is not appropriate for data analysis. McKelvey and Zavoina (1975) proposed the ordinal (or ordered) probit model. Consider the following regression equation: yi D x0i ˇ C i where error disturbances, i , have the distribution function F . The unobserved continuous random variable, yi , is identified as M categories. Suppose there are M C 1 real numbers, 0 ; ; M , where 0 D 1, 1 D 0, M D 1, and 0 1 M . Define Ri;j D j
x0i ˇ
The probability that the unobserved dependent variable is contained in the j th category can be written as P Œj
1
< yi j D F .Ri;j /
F .Ri;j
1/
The log-likelihood function is `D
N X M X
dij log F .Ri;j /
F .Ri;j
1/
i D1 j D1
where dij D
1 ifj 1 < yi j 0 otherwise
The first derivatives are written as N X M X f .Ri;j 1 / f .Ri;j / @` D dij xi @ˇ F .Ri;j / F .Ri;j 1 / i D1 j D1
Ordinal Discrete Choice Modeling F 1445
N X M X ıj;k f .Ri;j / ıj 1;k f .Ri;j @` D dij @k F .Ri;j / F .Ri;j 1 /
1/
i D1 j D1
where f .x/ D dFdx.x/ and ıj;k D 1 if j D k. When the ordinal probit is estimated, it is assumed that F .Ri;j / D ˆ.Ri;j /. The ordinal logit model is estimated if F .Ri;j / D ƒ.Ri;j /. The first threshold parameter, 1 , is estimated when the LIMIT1=VARYING option is specified. By default (LIMIT1=ZERO), so that M 2 threshold parameters (2 ; : : : ; M 1 ) are estimated. The ordered probit models are analyzed by Aitchison and Silvey (1957), and Cox (1970) discussed ordered response data by using the logit model. They defined the probability that yi belongs to j th category as P Œj
1
< yi j D F .j C x0i /
F .j
1
C x0i /
where 0 D 1 and M D 1. Therefore, the ordered response model analyzed by Aitchison and Silvey can be estimated if the LIMIT1=VARYING option is specified. Note that D ˇ.
Goodness-of-Fit Measures The goodness-of-fit measures discussed in this section apply only to discrete dependent variable models. McFadden (1974) suggested a likelihood ratio index that is analogous to the R2 in the linear regression model: 2 RM D1
ln L ln L0
where L is the value of the maximum likelihood function and L0 is a likelihood function when regression coefficients except an intercept term are zero. It can be shown that L0 can be written as L0 D
M X j D1
Nj ln.
Nj / N
where Nj is the number of responses in category j . Estrella (1998) proposes the following requirements for a goodness-of-fit measure to be desirable in discrete choice modeling: The measure must take values in Œ0; 1, where 0 represents no fit and 1 corresponds to perfect fit. The measure should be directly related to the valid test statistic for significance of all slope coefficients. The derivative of the measure with respect to the test statistic should comply with corresponding derivatives in a linear regression.
1446 F Chapter 21: The QLIM Procedure
Estrella’s (1998) measure is written 2 RE1
D1
ln L ln L0
2 N
ln L0
An alternative measure suggested by Estrella (1998) is 2 RE 2 D1
Œ.ln L
K/= ln L0
2 N
ln L0
where ln L0 is computed with null slope parameter values, N is the number observations used, and K represents the number of estimated parameters. Other goodness-of-fit measures are summarized as follows: 2 RC U1
2 RC U2
D1
D
1
N2 .Cragg
Uhler1/
2
.L0 =L/ N 2
1 2 RA D
L0 L
.Cragg
Uhler2/
L0N
2.ln L ln L0 / .Aldrich 2.ln L ln L0 / C N
Nelson/
2 ln L0 N .Veall Zimmermann/ 2 ln L0 PN yNOi /2 i D1 .yOi 2 RMZ D .McKelvey Zavoina/ PN N C i D1 .yOi yNOi /2 P where yOi D x0i ˇO and yNOi D N i D1 yOi =N . 2 RV2 Z D RA
Limited Dependent Variable Models Censored Regression Models When the dependent variable is censored, values in a certain range are all transformed to a single value. For example, the standard tobit model can be defined as yi D x0i ˇ C i yi ifyi > 0 yi D 0 ifyi 0 where i i idN.0; 2 /. The log-likelihood function of the standard censored regression model is X X yi x0i ˇ 0 `D lnŒ1 ˆ.xi ˇ=/ C ln . /= i 2fyi D0g
i 2fyi >0g
Limited Dependent Variable Models F 1447
where ˆ./ is the cumulative density function of the standard normal distribution and ./ is the probability density function of the standard normal distribution. The tobit model can be generalized to handle observation-by-observation censoring. The censored model on both of the lower and upper limits can be defined as 8 < Ri if yi Ri y if Li < yi < Ri yi D : i Li if yi Li The log-likelihood function can be written as
X
` D
ln .
i2fLi 0 D 0 if y1i 0
y2i
D y2i if y1i >0 D 0 if y1i 0
where .u1i ; u2i / N.0; †/. The likelihood function is described as P .y1 < 0/P .y1 > 0; y2 /. Type 3 Tobit The Type 3 Tobit model is different from the Type 2 Tobit in that y1i of the Type 3 Tobit is observed when y1i > 0. y1i
D x01i ˇ1 C u1i
y2i
D x02i ˇ2 C u2i
y1i
D y1i if y1i >0 D 0 if y1i 0
y2i
D y2i if y1i >0 D 0 if y1i 0
where .u1i ; u2i /0 i idN.0; †/. The likelihood function is characterized as P .y1 < 0/P .y1 ; y2 /. Type 4 Tobit The Type 4 Tobit model consists of three equations: y1i
D x01i ˇ1 C u1i
y2i
D x02i ˇ2 C u2i
y3i
D x03i ˇ3 C u3i
y1i
D y1i if y1i >0 D 0 if y1i 0
y2i
D y2i if y1i >0 D 0 if y1i 0
y3i
D y3i if y1i 0 D 0 if y1i >0
where .u1i ; u2i ; u3i /0 i idN.0; †/. The likelihood function of the Type 4 Tobit model is characterized as P .y1 < 0; y3 /P .y1 ; y2 /.
Limited Dependent Variable Models F 1449
Type 5 Tobit The Type 5 Tobit model is defined as follows: y1i
D x01i ˇ1 C u1i
y2i
D x02i ˇ2 C u2i
y3i
D x03i ˇ3 C u3i
y1i
D 1 if y1i >0 D 0 if y1i 0
y2i
D y2i if y1i >0 D 0 if y1i 0
y3i
D y3i if y1i 0 D 0 if y1i >0
where .u1i ; u2i ; u3i /0 are from iid trivariate normal distribution. The likelihood function of the Type 5 Tobit model is characterized as P .y1 < 0; y3 /P .y1 > 0; y2 /. Code examples for these models can be found in “Example 21.6: Types of Tobit Models” on page 1476.
Truncated Regression Models In a truncated model, the observed sample is a subset of the population where the dependent variable falls in a certain range. For example, when neither a dependent variable nor exogenous variables are observed for yi 0, the truncated regression model can be specified. X ..yi x0i ˇ/=/ `D ln ˆ.x0i ˇ= / C ln i 2fyi >0g
Two-limit truncation model is defined as yi D yi if Li < yi < Ri The log-likelihood function of the two-limit truncated regression model is N X yi `D ln . i D1
x0i ˇ
/=
Ri ln ˆ.
x0i ˇ
/
ˆ.
x0i ˇ
Li
/
The log-likelihood functions of the lower- and upper-limit truncation model are N X yi ` D ln . i D1 N X yi ` D ln . i D1
x0i ˇ x0i ˇ
ln 1
/= /=
ln ˆ.
ˆ.
x0i ˇ
Li x0i ˇ
Ri
(lower) /
/ (upper)
1450 F Chapter 21: The QLIM Procedure
Stochastic Frontier Production and Cost Models Stochastic frontier production models were first developed by Aigner, Lovell, and Schmidt (1977) and Meeusen and van den Broeck (1977). Specification of these models allow for random shocks of the production or cost but also include a term for technological or cost inefficiency. Assuming that the production function takes a log-linear Cobb-Douglas form, the stochastic frontier production model can be written as X ln.yi / D ˇ0 C ˇn ln.xni / C i n
where i D vi ui . The vi term represents the stochastic error component and ui is the nonnegative, technology inefficiency error component. The vi error component is assumed to be distributed iid normal and independently from ui . If ui > 0, the error term, i , is negatively skewed and represents technology inefficiency. If ui < 0, the error term i is positively skewed and represents cost inefficiency. PROC QLIM models the ui error component as a half normal, exponential, or truncated normal distribution.
The Normal-Half Normal Model In case of the normal-half normal model, vi is iid N.0; v2 /, ui is iid N C .0; u2 / with vi and ui independent of each other. Given the independence of error terms, the joint density of v and u can be written as 2 v2 u2 f .u; v/ D exp 2u v 2u2 2v2 Substituting v D C u into the preceding equation gives 2 . C u/2 u2 f .u; / D exp 2u v 2u2 2v2 Integrating u out to obtain the marginal density function of results in the following form: 1
Z f ./ D D D
f .u; /du 2 2 1 exp p 2 2 2 2 ˆ 0
where D u =v and D
p u2 C v2 .
In the case of a stochastic frontier cost model, v D 2 f ./ D ˆ
u and
Stochastic Frontier Production and Cost Models F 1451
The log-likelihood function for the production model with N producers is written as X i 1 X 2 ln ˆ i ln L D const ant N ln C 2 2 i
i
The Normal-Exponential Model Under the normal-exponential model, vi is iid N.0; v2 / and ui is iid exponential. Given the independence of error term components ui and vi , the joint density of v and u can be written as 1 u v2 f .u; v/ D p exp u 2v2 2u v The marginal density function of for the production function is 1
Z f ./ D D
f .u; /du 1 ˆ u v
0
v v2 exp C 2 u u 2u
and the marginal density function for the cost function is equal to 1 v v2 f ./ D ˆ exp C 2 u v u u 2u The log-likelihood function for the normal-exponential production model with N producers is 2 X X v i v i ln L D const ant N ln u C N C C ln ˆ 2u2 u v u i
i
1452 F Chapter 21: The QLIM Procedure
The Normal-Truncated Normal Model The normal-truncated normal model is a generalization of the normal-half normal model by allowing the mean of ui to differ from zero. Under the normal-truncated normal model, the error term component vi is iid N C .0; v2 / and ui is iid N.; u2 /. The joint density of vi and ui can be written as 1 .u /2 v2 f .u; v/ D p exp 2u2 2v2 2u v ˆ .=u / The marginal density function of for the production function is 1
Z f ./ D D D
f .u; /du 0
. C /2 exp ˆ p 2 2 2 ˆ .=u / 1 1 C ˆ ˆ u
1
and the marginal density function for the cost function is
f ./ D
1 ˆ ˆ C u
1
The log-likelihood function for the normal-truncated normal production model with N producers is
ln L D const ant
N ln
N ln ˆ u
C
X i
i ln ˆ C
1 X i C 2 2 i
For more detail on normal-half normal, normal-exponential, and normal-truncated models, see Kumbhakar and Knox Lovell (2000) and Coelli, Prasada Rao, and Battese (1998).
Heteroscedasticity and Box-Cox Transformation Heteroscedasticity If the variance of regression disturbance, (i ), is heteroscedastic, the variance can be specified as a function of variables E.i2 / D i2 D f .z0i /
Heteroscedasticity and Box-Cox Transformation F 1453
The following table shows various functional forms of heteroscedasticity and the corresponding options to request each model. No.
Model
Options
1 2 3 4 5 6
f .z0i / D 2 .1 C exp.z0i // f .z0i / D 2 exp.z0i / P
l zli / f .z0i / D 2 .1 C L PlD1 L 0 2 f .zi / D .1 C . lD1 l zli /2 / P
l zli / f .z0i / D 2 . L PlD1 L 0 2 f .zi / D .. lD1 l zli /2 /
LINK=EXP (default) LINK=EXP NOCONST LINK=LINEAR LINK=LINEAR SQUARE LINK=LINEAR NOCONST LINK=LINEAR SQUARE NOCONST
For discrete choice models, 2 is normalized ( 2 D 1) since this parameter is not identified. Note that in models 3 and 5, it may be possible that variances of some observations are negative. Although the QLIM procedure assigns a large penalty to move the optimization away from such region, it is possible that the optimization cannot improve the objective function value and gets locked in the region. Signs of such outcome include extremely small likelihood values or missing standard errors in the estimates. In models 2 and 6, variances are guaranteed to be greater or equal to zero, but it may be possible that variances of some observations are very close to zero. In these scenarios, standard errors may be missing. Models 1 and 4 do not have such problems. Variances in these models are always positive and never close to zero. The heteroscedastic regression model is estimated using the following log-likelihood function: `D
N ln.2/ 2
where ei D yi
N X 1 i D1
2
N
ln.i2 /
1 X ei 2 . / 2 i i D1
x0i ˇ.
Box-Cox Modeling The Box-Cox transformation on x is defined as ( x 1 if ¤ 0 x ./ D ln.x/ if D 0 The Box-Cox regression model with heteroscedasticity is written as . / yi 0
D ˇ0 C
K X
. /
ˇk xki k C i
kD1
D i C i where i N.0; i2 / and transformed variables must be positive. In practice, too many transformation parameters cause numerical problems in model fitting. It is common to have the same Box-Cox transformation performed on all the variables — that is, 0 D 1 D D K . It is required for the
1454 F Chapter 21: The QLIM Procedure
magnitude of transformed variables to be in the tolerable range if the corresponding transformation parameters are jj > 1. The log-likelihood function of the Box-Cox regression model is written as `D
N X
N ln.2/ 2 .0 /
where ei D yi
ln.i /
i D1
N 1 X 2 e C .0 2i2 i D1 i
1/
N X
ln.yi /
i D1
i .
When the dependent variable is discrete, censored, or truncated, the Box-Cox transformation can be applied only to explanatory variables.
Bivariate Limited Dependent Variable Modeling The generic form of a bivariate limited dependent variable model is y1i
D x01i ˇ1 C 1i
y2i
D x02i ˇ2 C 2i
where the disturbances, 1i and 2i , have joint normal distribution with zero mean, standard deviations 1 and 2 , and correlation of . y1 and y2 are latent variables. The dependent variables y1 and y2 are observed if the latent variables y1 and y2 fall in certain ranges: y1 D y1i if y1i 2 D1 .y1i / y2 D y2i if y2i 2 D2 .y2i / D is a transformation from .y1i ; y2i / to .y1i ; y2i /. For example, if y1 and y2 are censored variables with lower bound 0, then y1 D y1i if y1i > 0;
y1 D 0 if y1i 0
y2 D y2i if y2i > 0;
y2 D 0 if y2i 0
There are three cases for the log likelihood of .y1i ; y2i /. The first case is that y1i D y1i and y2i D y2i . That is, this observation is mapped to one point in the space of latent variables. The log likelihood is computed from a bivariate normal density, y1 x1 0 ˇ1 y2 x2 0 ˇ2 `i D ln 2 . ; ; / ln 1 ln 2 1 2
where 2 .u; v; / is the density function for standardized bivariate normal distribution with correlation , 2 .u; v; / D
e
.1=2/.u2 Cv 2 2uv/=.1 2 /
2.1
2 /1=2
Selection Models F 1455
The second case is that one observed dependent variable is mapped to a point of its latent variable and the other dependent variable is mapped to a segment in the space of its latent variable. For example, in the bivariate censored model specified, if observed y1 > 0 and y2 D 0, then y1 D y1 and y2 2 . 1; 0. In general, the log likelihood for one observation can be written as follows (the subscript i is dropped for simplicity): If one set is a single point and the other set is a range, without loss of generality, let D1 .y1 / D fy1 g and D2 .y2 / D ŒL2 ; R2 , `i
y1 D ln . " C ln ˆ
x1 0 ˇ1 / 1 x2 0 ˇ2
R2
2
ln 1 y1
x1 0 ˇ1 1
! ˆ
L2
x2 0 ˇ2
y1
x1 0 ˇ1 1
!#
2
where and ˆ are the density function and the cumulative probability function for standardized univariate normal distribution. The third case is that both dependent variables are mapped to segments in the space of latent variables. For example, in the bivariate censored model specified, if observed y1 D 0 and y2 D 0, then y1 2 . 1; 0 and y2 2 . 1; 0. In general, if D1 .y1 / D ŒL1 ; R1 and D2 .y2 / D ŒL2 ; R2 , the log likelihood is Z `i D ln
R1 L1
x1 0 ˇ1 1
x1 0 ˇ1 1
R2
Z
L2
x2 0 ˇ2 2
x2 0 ˇ2 2
2 .u; v; / du dv
Selection Models In sample selection models, one or several dependent variables are observed when another variable takes certain values. For example, the standard Heckman selection model can be defined as zi D w0i C ui 1 if zi > 0 zi D 0 if zi 0 yi D x0i ˇ C i
if zi D 1
where ui and i are jointly normal with zero mean, standard deviations of 1 and , and correlation of . z is the variable that the selection is based on, and y is observed when z has a value of 1. Least squares regression using the observed data of y produces inconsistent estimates of ˇ. Maximum likelihood method is used to estimate selection models. It is also possible to estimate these models by using Heckman’s method, which is more computationally efficient. But it can be shown that the resulting estimates, although consistent, are not asymptotically efficient under normality assumption. Moreover, this method often violates the constraint on correlation coefficient jj 1.
1456 F Chapter 21: The QLIM Procedure
The log-likelihood function of the Heckman selection model is written as X
` D
lnŒ1
ˆ.w0i /
(
yi
i 2fzi D0g
X
C
ln .
i 2fzi D1g
xi 0 ˇ /
0
w0i C yi xi ˇ ln C ln ˆ p 1 2
!)
Only one variable is allowed for the selection to be based on, but the selection may lead to several variables. For example, in the following switching regression model, zi D w0i C ui 1 if zi > 0 zi D 0 if zi 0 y1i y2i
D x01i ˇ1 C 1i D
x02i ˇ2
C 2i
if zi D 0 if zi D 1
z is the variable that the selection is based on. If z D 0, then y1 is observed. If z D 1, then y2 is observed. Because it is never the case that y1 and y2 are observed at the same time, the correlation between y1 and y2 cannot be estimated. Only the correlation between z and y1 and the correlation between z and y2 can be estimated. This estimation uses the maximum likelihood method. A brief example of the code for this model can be found in “Example 21.4: Sample Selection Model” on page 1472. The Heckman selection model can include censoring or truncation. For a brief example of the code for these models see “Example 21.5: Sample Selection Model with Truncation and Censoring” on page 1473. The following example shows a variable yi that is censored from below at zero. zi D w0i C ui 1 if zi > 0 zi D 0 if zi 0 yi D x0i ˇ C i if zi D 1 yi ifyi > 0 yi D 0 ifyi 0
Multivariate Limited Dependent Models F 1457
In this case, the log-likelihood function of the Heckman selection model needs to be modified to include the censored region. X
` D
ˆ.w0i /
lnŒ1
fijzi D0g
( yi ln .
X
C
fijzi D1;yi Dyi g
X
C
Z
xi 0 ˇ
Z
xi 0 ˇ /
1
!#)
1
ln
fijzi D1;yi D0g
0
w0i C yi xi ˇ ln C ln ˆ p 1 2 "
wi 0
2 .u; v; / du dv
In case yi is truncated from below at zero instead of censored, the likelihood function can be written as X
` D
lnŒ1
ˆ.w0i /
fijzi D0g
X
C
( yi ln .
fijzi D1g
xi 0 ˇ /
0
w0i C yi xi ˇ ln C ln ˆ p 1 2 "
!# ln
ˆ.x0i ˇ=/
)
Multivariate Limited Dependent Models The multivariate model is similar to bivariate models. The generic form of the multivariate limited dependent variable model is y1i
D x01i ˇ1 C 1i
y2i
D x02i ˇ2 C 2i :::
ymi
D x0mi ˇm C mi
where m is the number of models to be estimated. The vector has multivariate normal distribution with mean 0 and variance-covariance matrix †. Similar to bivariate models, the likelihood may involve computing multivariate normal integrations. This is done using Monte Carlo integration. (See Genz (1992) and Hajivassiliou and McFadden (1998).) When the number of equations, N , increases in a system, the number of parameters increases at the rate of N 2 because of the correlation matrix. When the number of parameters is large, sometimes the optimization converges but some of the standard deviations are missing. This usually means that the model is over-parameterized. The default method for computing the covariance is to use the inverse Hessian matrix. The Hessian is computed by finite differences, and in over-parameterized cases, the inverse cannot be computed. It is recommended that you reduce the number of parameters in such cases. Sometimes using the outer product covariance matrix (COVEST=OP option) may also help.
1458 F Chapter 21: The QLIM Procedure
Tests on Parameters Tests on Parameters In general, the hypothesis tested can be written as H0 W h. / D 0 where h. / is an r by 1 vector valued function of the parameters given by the r expressions specified in the TEST statement. O Let O be the unconstrained estimate of and Q Let VO be the estimate of the covariance matrix of . Q be the constrained estimate of such that h./ D 0. Let A. / D @h. /=@ jO Using this notation, the test statistics for the three kinds of tests are computed as follows. The Wald test statistic is defined as 8 9 1 0 0 O W D h .O /:A.O /VO A .O /; h./ The Wald test is not invariant to reparameterization of the model (Gregory 1985; Gallant 1987, p. 219). For more information about the theoretical properties of the Wald test, see Phillips and Park (1988). The Lagrange multiplier test statistic is 0 0 LM D A.Q /VQ A .Q /
Q where is the vector of Lagrange multipliers from the computation of the restricted estimate . The likelihood ratio test statistic is LR D 2 L.O / L.Q / where Q represents the constrained estimate of and L is the concentrated log-likelihood value. For each kind of test, under the null hypothesis the test statistic is asymptotically distributed as a 2 random variable with r degrees of freedom, where r is the number of expressions in the TEST statement. The p-values reported for the tests are computed from the 2 .r/ distribution and are only asymptotically valid. Monte Carlo simulations suggest that the asymptotic distribution of the Wald test is a poorer approximation to its small sample distribution than that of the other two tests. However, the Wald test has the lowest computational cost, since it does not require computation of the constrained estimate Q . The following is an example of using the TEST statement to perform a likelihood ratio test:
Output to SAS Data Set F 1459
proc qlim; model y = x1 x2 x3; test x1 = 0, x2 * .5 + 2 * x3 = 0 /lr; run;
Output to SAS Data Set XBeta, Predicted, Residual Xbeta is the structural part on the right-hand side of the model. Predicted value is the predicted dependent variable value. For censored variables, if the predicted value is outside the boundaries, it is reported as the closest boundary. For discrete variables, it is the level whose boundaries Xbeta falls between. Residual is defined only for continuous variables and is defined as Residual D Observed
P red i cted
Error Standard Deviation Error standard deviation is i in the model. It varies only when the HETERO statement is used.
Marginal Effects Marginal effect is defined as a contribution of one control variable to the response variable. For the binary choice model with two response categories, 0 D 1, 1 D 0, 0 D 1; and ordinal response model with M response categories, 0 ; ; M , define Ri;j D j
x0i ˇ
The probability that the unobserved dependent variable is contained in the j th category can be written as P Œj
1
< yi j D F .Ri;j /
F .Ri;j
1/
The marginal effect of changes in the regressors on the probability of yi D j is then @P robŒyi D j D Œf .j @x where f .x/ D
1
x0i ˇ/
f .j
x0i ˇ/ˇ
dF .x/ . dx
In particular, ( 1 2 p e x =2 .probit/ dF .x/ 2 f .x/ D D e x dx .logit/ Œ1Ce . x/ 2
The marginal effects in the Box-Cox regression model are @EŒyi x k Dˇ @x y 0
1 1
1460 F Chapter 21: The QLIM Procedure
The marginal effects in the truncated regression model are @EŒyi jLi < yi < Ri ..ai / .bi //2 ai .ai / Dˇ 1 C 2 @x .ˆ.bi / ˆ.ai // ˆ.bi / where ai D
Li x0i ˇ i
and bi D
bi .bi / ˆ.ai /
Ri x0i ˇ . i
The marginal effects in the censored regression model are @EŒyjxi D ˇ P robŒLi < yi < Ri @x
Inverse Mills Ratio, Expected and Conditionally Expected Values Expected and conditionally expected values are computed only for continuous variables. The inverse Mills ratio is computed for censored or truncated continuous, binary discrete, and selection endogenous variables. Let Li and Ri be the lower boundary and upper boundary, respectively, for the yi . Define ai D Li x0i ˇ R x0 ˇ and bi D i i i . Then the inverse Mills ratio is defined as i D
..ai / .ˆ.bi /
.bi // ˆ.ai //
for a continuous variable and defined as D
.x0i ˇ/ ˆ.x0i ˇ/
for a binary discrete variable. The expected value is the unconditional expectation of the dependent variable. For a censored variable, it is EŒyi D ˆ.ai /Li C .x0i ˇ C i /.ˆ.bi /
ˆ.ai // C .1
ˆ.bi //Ri
For a left-censored variable (Ri D 1), this formula is EŒyi D ˆ.ai /Li C .x0i ˇ C i /.1 where D
.ai / . 1 ˆ.ai /
For a right-censored variable (Li D
1), this formula is
EŒyi D .x0i ˇ C i /ˆ.bi / C .1 where D
ˆ.ai //
ˆ.bi //Ri
.bi / . ˆ.bi /
For a noncensored variable, this formula is EŒyi D x0i ˇ The conditional expected value is the expectation given that the variable is inside the boundaries: EŒyi jLi < yi < Ri D x0i ˇ C i
Output to SAS Data Set F 1461
Probability Probability applies only to discrete responses. It is the marginal probability that the discrete response is taking the value of the observation. If the PROBALL option is specified, then the probability for all of the possible responses of the discrete variables is computed.
Technical Efficiency Technical efficiency for each producer is computed only for stochastic frontier models. In general, the stochastic production frontier can be written as yi D f .xi I ˇ/ expfvi gTEi where yi denotes producer i ’s actual output, f ./ is the deterministic part of production frontier, expfvi g is a producer-specific error term, and TEi is the technical efficiency coefficient, which can be written as TEi D
yi : f .xi I ˇ/ expfvi g
In the case of a Cobb-Douglas production function, TEi D expf ui g. See the section “Stochastic Frontier Production and Cost Models” on page 1450. Cost frontier can be written in general as Ei D c.yi ; wi I ˇ/ expfvi g=CEi where wi denotes producer i ’s input prices, c./ is the deterministic part of cost frontier, expfvi g is a producer-specific error term, and CEi is the cost efficiency coefficient, which can be written as CEi D
c.xi ; wi I ˇ/ expfvi g Ei
In the case of a Cobb-Douglas cost function, CEi D expf ui g. See the section “Stochastic Frontier Production and Cost Models” on page 1450. Hence, both technical and cost efficiency coefficients are the same. The estimates of technical efficiency are provided in the following subsections. Normal-Half Normal Model Define D u2 = 2 and 2 D u2 v2 = 2 . Then, as it is shown by Jondrow et al. (1982), conditional density is as follows: f .u; / 1 .u /2 f .uj/ D Dp exp 1 ˆ f ./ 22 2 Hence, f .uj/ is the density for N C . ; 2 /. Using this result, it follows that the estimate of technical efficiency (Battese and Coelli, 1988) is 1 ˆ. i = / 1 2 TE1i D E.expf ui gji / D exp i C 1 ˆ. i = / 2
1462 F Chapter 21: The QLIM Procedure
The second version of the estimate (Jondrow et al., 1982) is TE2i D expf E.ui ji /g where E.ui ji / D i C
. i = / .i =/ D 1 ˆ. i = / 1 ˆ.i =/
i
Normal-Exponential Model Define A D = Q v and Q D conditional density is as follows: f .uj/ D p
v2 =u . Then, as it is shown by Kumbhakar and Lovell (2000),
1 2v ˆ. = Q v/
exp
/ Q 2
.u
2 2
Hence, f .uj/ is the density for N C .; Q v2 /. Using this result, it follows that the estimate of technical efficiency is 1 ˆ.v Q i =v / 1 TE1i D E.expf ui gji / D exp Q i C v2 1 ˆ. Q i =v / 2 The second version of the estimate is TE2i D expf E.ui ji /g where E.ui ji / D Q i C v
. Q i =v / .A/ D v 1 ˆ. Q i =v / ˆ. A/
A
Normal-Truncated Normal Model Define Q D . u2 i C v2 /= 2 and 2 D u2 v2 = 2 . Then, as it is shown by Kumbhakar and Lovell (2000), conditional density is as follows: 1 .u / Q 2 exp f .uj/ D p 22 2 Œ1 ˆ. = Q / Hence, f .uj/ is the density for N C .; Q 2 /. Using this result, it follows that the estimate of technical efficiency is 1 ˆ. Q i = / 1 2 TE1i D E.expf ui gji / D exp Q i C 1 ˆ. Q i = / 2 The second version of the estimate is TE2i D expf E.ui ji /g where E.ui ji / D Q i C
.Q i = / 1 ˆ. Q i = /
OUTEST= Data Set F 1463
OUTEST= Data Set The OUTEST= data set contains all the parameters estimated in a MODEL statement. The OUTEST= option can be used when the PROC QLIM call contains one MODEL statement: proc qlim data=a outest=e; model y = x1 x2 x3; endogenous y ~ censored(lb=0); run;
Each parameter contains the estimate for the corresponding parameter in the corresponding model. In addition, the OUTEST= data set contains the following variables: _NAME_
the name of the independent variable
_TYPE_
type of observation. PARM indicates the row of coefficients; STD indicates the row of standard deviations of the corresponding coefficients.
_STATUS_
convergence status for optimization
The rest of the columns correspond to the explanatory variables. The OUTEST= data set contains one observation for the MODEL statement, giving the parameter estimates for that model. If the COVOUT option is specified, the OUTEST= data set includes additional observations for the MODEL statement, giving the rows of the covariance matrix of parameter estimates. For covariance observations, the value of the _TYPE_ variable is COV, and the _NAME_ variable identifies the parameter associated with that row of the covariance matrix. If the CORROUT option is specified, the OUTEST= data set includes additional observations for the MODEL statement, giving the rows of the correlation matrix of parameter estimates. For correlation observations, the value of the _TYPE_ variable is CORR, and the _NAME_ variable identifies the parameter associated with that row of the correlation matrix.
Naming Naming of Parameters When there is only one equation in the estimation, parameters are named in the same way as in other SAS procedures such as REG, PROBIT, and so on. The constant in the regression equation is called Intercept. The coefficients on independent variables are named by the independent variables. The standard deviation of the errors is called _Sigma. If there are Box-Cox transformations, the coefficients are named _Lambdai , where i increments from 1, or as specified by the user. The limits for the discrete dependent variable are named _Limiti . If the LIMIT=varying option is specified, then _Limiti starts from 1. If the LIMIT=varying option is not specified, then _Limit1 is set to 0 and the limit parameters start from i D 2. If the HETERO statement is included, the coefficients of the independent variables in the hetero equation are called _H.x, where x is the name of the independent
1464 F Chapter 21: The QLIM Procedure
variable. If the parameter name includes interaction terms, it needs to be enclosed in quotation marks followed by N . The following example restricts the parameter that includes the interaction term to be greater than zero: proc qlim data=a; model y = x1|x2; endogenous y ~ discrete; restrict "x1*x2"N>0; run;
When there are multiple equations in the estimation, the parameters in the main equation are named in the format of y.x, where y is the name of the dependent variable and x is the name of the independent variable. The standard deviation of the errors is called _Sigma.y. The correlation of the errors is called _Rho for bivariate model. For the model with three variables it is _Rho.y1.y2, _Rho.y1.y3, _Rho.y2.y3. The construction of correlation names for multivariate models is analogous. Box-Cox parameters are called _Lambdai .y and limit variables are called _Limiti .y. Parameters in the HETERO statement are named as _H.y.x. In the OUTEST= data set, all variables are changed from ’.’ to ’_’.
Naming of Output Variables The following table shows the option in the OUTPUT statement, with the corresponding variable names and their explanation.
ODS Table Names F 1465
Option
Name
Explanation
PREDICTED RESIDUAL XBETA ERRSTD PROB
P_y RESID_y XBETA_y ERRSTD_y PROB_y
PROBALL
PROBi_y
MILLS EXPECTED CONDITIONAL
MILLS_y EXPCT_y CEXPCT_y
MARGINAL
MEFF_x
Predicted value of y Residual of y, (y-PredictedY) Structure part (x0 ˇ) of y equation Standard deviation of error term Probability that y is taking the observed value in this observation (discrete y only) Probability that y is taking the i th value (discrete y only) Inverse Mills ratio for y Unconditional expected value of y Conditional expected value of y, condition on the truncation. @y Marginal effect of x on y ( @x ) with single equation @y Marginal effect of x on y ( @x ) with multiple equations / Marginal effect of x on y ( @P rob.yDi ) @x with single equation and discrete y / Marginal effect of x on y ( @P rob.yDi ) @x with multiple equations and discrete y Technical efficiency estimate for each producer proposed by Battese and Coelli (1988) Technical efficiency estimate for each producer proposed by Jondrow et al. (1982)
MEFF_y_x MEFF_Pi_x MEFF_Pi_y_x TE1
TE1
TE2
TE2
If you prefer to name the output variables differently, you can use the RENAME option in the data set. For example, the following statements rename the residual of y as Resid: proc qlim data=one; model y = x1-x10 / censored; output out=outds(rename=(resid_y=resid)) residual; run;
ODS Table Names PROC QLIM assigns a name to each table it creates. You can use these names to denote the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the Table 21.2.
1466 F Chapter 21: The QLIM Procedure
Table 21.2
ODS Tables Produced in PROC QLIM by the MODEL Statement and TEST Statement
ODS Table Name
Description
Option
ODS Tables Created by the MODEL Statement and TEST Statement ResponseProfile Response profile ClassLevels Class levels FitSummary Summary of nonlinear estimation GoodnessOfFit Pseudo-R-square measures ConvergenceStatus Convergence status ParameterEstimates Parameter estimates SummaryContResponse Summary of continuous response CovB Covariance of parameter estimates CorrB Correlation of parameter estimates LinCon Linear constraints InputOptions Input options ProblemDescription Problem description IterStart Optimization start summary IterHist Iteration history IterStop Optimization results ConvergenceStatus Convergence status ParameterEstimatesStart Optimization start ParameterEstimatesResults Resulting parameters LinConSol Linear constraints evaluated at solution
default default default default default default default COVB CORRB ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT ITPRINT
ODS Tables Created by the TEST Statement TestResults Test results
default
Examples: QLIM Procedure
Example 21.1: Ordered Data Modeling Cameron and Trivedi (1986) studied Australian Health Survey data. Variable definitions are given in Cameron and Trivedi (1998, p. 68). The dependent variable, dvisits, has nine ordered values. The following SAS statements estimate the ordinal probit model:
Example 21.1: Ordered Data Modeling F 1467
/*-- Ordered Discrete Responses --*/ proc qlim data=docvisit; model dvisits = sex age agesq income levyplus freepoor freerepa illness actdays hscore chcond1 chcond2 / discrete; run;
The output of the QLIM procedure for ordered data modeling is shown in Output 21.1.1. Output 21.1.1 Ordered Data Modeling Binary Data The QLIM Procedure Discrete Response Profile of dvisits
Index
Value
1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8
Total Frequency 4141 782 174 30 24 9 12 12 6
Output 21.1.1 continued Model Fit Summary Number of Endogenous Variables Endogenous Variable Number of Observations Log Likelihood Maximum Absolute Gradient Number of Iterations Optimization Method AIC Schwarz Criterion
1 dvisits 5190 -3138 0.0003675 82 Quasi-Newton 6316 6447
1468 F Chapter 21: The QLIM Procedure
Output 21.1.1 continued Goodness-of-Fit Measures Measure
Value
Likelihood Ratio (R) Upper Bound of R (U) Aldrich-Nelson Cragg-Uhler 1 Cragg-Uhler 2 Estrella Adjusted Estrella McFadden's LRI Veall-Zimmermann McKelvey-Zavoina
789.73 7065.9 0.1321 0.1412 0.1898 0.149 0.1416 0.1118 0.2291 0.2036
Formula 2 * (LogL - LogL0) - 2 * LogL0 R / (R+N) 1 - exp(-R/N) (1-exp(-R/N)) / (1-exp(-U/N)) 1 - (1-R/U)^(U/N) 1 - ((LogL-K)/LogL0)^(-2/N*LogL0) R / U (R * (U+N)) / (U * (R+N))
N = # of observations, K = # of regressors Parameter Estimates
Parameter
DF
Estimate
Standard Error
t Value
Intercept sex age agesq income levyplus freepoor freerepa illness actdays hscore chcond1 chcond2 _Limit2 _Limit3 _Limit4 _Limit5 _Limit6 _Limit7 _Limit8
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
-1.378705 0.131885 -0.534190 0.857308 -0.062211 0.137030 -0.346045 0.178382 0.150485 0.100575 0.031862 0.061601 0.135321 0.938884 1.514288 1.711660 1.952860 2.087422 2.333786 2.789796
0.147413 0.043785 0.815907 0.898364 0.068017 0.053262 0.129638 0.074348 0.015747 0.005850 0.009201 0.049024 0.067711 0.031219 0.049329 0.058151 0.072014 0.081655 0.101760 0.156189
-9.35 3.01 -0.65 0.95 -0.91 2.57 -2.67 2.40 9.56 17.19 3.46 1.26 2.00 30.07 30.70 29.43 27.12 25.56 22.93 17.86
Approx Pr > |t| 0 */ Xi = 0; /* Xi > 0 */ endsub; quit;
The following points should be noted regarding the LOGNGPD definition: The parameters xr and pn are not estimated with the maximum likelihood method used by PROC SEVERITY, so you need to specify them as constant parameters by defining the dist_CONSTANTPARM subroutine. The signature of LOGNGPD_CONSTANTPARM subroutine lists only the constant parameters Xr and Pn. The parameter xr is estimated by first using the SVRTUTIL_HILLCUTOFF utility function to compute an estimate of the cutoff point xO b and then computing xr D xO b =e O . If SVRTUTIL_HILLCUTOFF fails to compute a valid estimate, then the SVRTUTIL_PERCENTILE utility function is used to set xO b to the pn th percentile of the data. The parameter pn is fixed to 0.8.
Example 22.3: Defining a Model for Mixed Tail Distributions F 1579
The SASHELP.SVRTDIST library is specified with the LIBRARY= option in the PROC FCMP statement to enable the LOGNGPD_PARMINIT subroutine to use the predefined utility functions (SVRTUTIL_HILLCUTOFF and SVRTUTIL_PERCENTILE) and parameter initialization subroutines (LOGN_PARMINIT and GPD_PARMINIT). The LOGNGPD_LOWERBOUNDS subroutine defines the lower bounds for all parameters. This subroutine is required because the parameter Mu has a non-default lower bound. The bounds for Sigma and Xi must be specified. If they are not specified, they are returned as missing values, which get interpreted as having no lower bound by PROC SEVERITY. You need not specify any bounds for the constant parameters Xr and Pn, because they are not subject to optimization. The following DATA step statements simulate a sample from a mixed tail distribution with a lognormal body and GPD tail. The parameter pn is fixed to 0.8, the same value used in the LOGNGPD_PARMINIT subroutine defined previously: /*----- Simulate a sample for the mixed tail distribution -----*/ data testmixdist(keep=y label='Lognormal Body-GPD Tail Sample'); call streaminit(45678); label y='Response Variable'; N = 100; Mu = 1.5; Sigma = 0.25; Xi = 1.5; Pn = 0.8; /* Generate data comprising the lognormal body */ Nbody = N*Pn; do i=1 to Nbody; y = exp(Mu) * rand('LOGNORMAL')**Sigma; output; end; /* Generate data comprising the GPD tail */ cutoff = quantile('LOGNORMAL', Pn, Mu, Sigma); gpd_scale = (1-Pn) / pdf('LOGNORMAL', cutoff, Mu, Sigma); do i=Nbody+1 to N; y = cutoff + ((1-rand('UNIFORM'))**(-Xi) - 1)*gpd_scale/Xi; output; end; run;
1580 F Chapter 22: The SEVERITY Procedure (Experimental)
The following statements use PROC SEVERITY to fit the LOGNGPD distribution model to the simulated sample. They also fit three other predefined distributions (BURR, LOGN, and GPD). The final parameter estimates are written to the WORK.PARMEST data set. /*--- Enable ODS graphics processing ---*/ ods graphics on; /*--- Set the search path for functions defined with PROC FCMP ---*/ options cmplib=(work.sevexmpl); /*-------- Fit LOGNGPD model with PROC SEVERITY --------*/ proc severity data=testmixdist print=all plots(histogram kernel)=all outest=parmest; model y; dist logngpd; dist burr; dist logn; dist gpd; run;
Some of the results prepared by PROC SEVERITY are shown in Output 22.3.1 through Output 22.3.4. The model selection table of Output 22.3.1 indicates that all models converged. The last table in Output 22.3.1 shows that the model with LOGNGPD distribution has the best fit according to almost all the statistics of fit. The Burr distribution model is the closest contender to the LOGNGPD model, but the GPD distribution model fits the data very poorly. Output 22.3.1 Summary of Fitting Mixed Tail Distribution The SEVERITY Procedure Input Data Set Name Label
WORK.TESTMIXDIST Lognormal Body-GPD Tail Sample Model Selection Table
Distribution
Converged
logngpd Burr Logn Gpd
Yes Yes Yes Yes
-2 Log Likelihood 418.78232 424.93728 459.43471 558.13444
Selected Yes No No No
Example 22.3: Defining a Model for Mixed Tail Distributions F 1581
Output 22.3.1 continued All Fit Statistics Table
Distribution
-2 Log Likelihood
AIC
AICC
BIC
logngpd Burr Logn Gpd
418.78232* 424.93728 459.43471 558.13444
428.78232* 430.93728 463.43471 562.13444
429.42062* 431.18728 463.55842 562.25815
441.80817 438.75280* 468.64505 567.34478
All Fit Statistics Table Distribution logngpd Burr Logn Gpd
AD 0.31670* 0.57649 3.27122 16.74156
CvM 0.04972* 0.07860 0.48448 3.31860
KS 0.62140* 0.71373 1.55267 3.43470
1582 F Chapter 22: The SEVERITY Procedure (Experimental)
The plots in Output 22.3.2 show that both the lognormal and GPD distributions fit the data poorly, GPD being the worst. The Burr distribution fits the data as well as the LOGNGPD mixed distribution in the body region, but has a poorer fit in the tail region than the LOGNGPD mixed distribution. Output 22.3.2 Comparison of the CDF and PDF Estimates of the Fitted Models
Example 22.3: Defining a Model for Mixed Tail Distributions F 1583
Output 22.3.2 continued
1584 F Chapter 22: The SEVERITY Procedure (Experimental)
The P-P plots of Output 22.3.3 provide a better visual confirmation that the LOGNGPD distribution fits the tail region better than the Burr distribution. Output 22.3.3 P-P Plots for the LOGNGPD and BURR Distribution Models
Example 22.3: Defining a Model for Mixed Tail Distributions F 1585
Output 22.3.3 continued
1586 F Chapter 22: The SEVERITY Procedure (Experimental)
The detailed results for the LOGNGPD distribution are shown in Output 22.3.4. The initial values table indicates the values computed by LOGNGPD_PARMINIT subroutine for the Xr and Pn parameters. It also uses the bounds columns to indicate the constant parameters. The last table in the figure shows the final parameter estimates. The estimates of all free parameters are significantly different than 0. As expected, the final estimates of the constant parameters Xr and Pn have not changed from their initial values. Output 22.3.4 Detailed Results for the LOGNGPD Distribution The SEVERITY Procedure Distribution Information Name Description
logngpd Lognormal Body-GPD Tail Distribution. Mu, Sigma, and Xi are free parameters. Xr and Pn are constant parameters. 5
Number of Distribution Parameters
Initial Parameter Values and Bounds for logngpd Distribution
Parameter
Initial Value
Lower Bound
Upper Bound
Mu Sigma Xi Xr Pn
1.49954 0.76306 0.36661 1.27395 0.80000
-Infty 1.05367E-8 1.05367E-8 Constant Constant
Infty Infty Infty Constant Constant
Convergence Status for logngpd Distribution Convergence criterion (GCONV=1E-8) satisfied. Optimization Summary for logngpd Distribution Optimization Technique Number of Iterations Number of Function Evaluations Log Likelihood
Trust Region 11 31 -209.39116
Parameter Estimates for logngpd Distribution
Parameter Mu Sigma Xi Xr Pn
Estimate
Standard Error
t Value
Approx Pr > |t|
1.57921 0.31868 1.03771 1.27395 0.80000
0.06426 0.04459 0.38205 Constant Constant
24.57 7.15 2.72 . .
F
1691.98
|t|
1 1 1
0.14239 0.77121 -1.77668
0.23657 0.01723 0.10843
0.60 44.77 -16.39
0.5523 ;
Functional Summary Table 25.1 summarizes the statements and options that control the SPECTRA procedure. Table 25.1
SPECTRA Functional Summary
Description
Statement
Option
Statements specify BY-group processing specify the variables to be analyzed specify weights for spectral density estimates
BY VAR WEIGHTS
Data Set Options specify the input data set specify the output data set
PROC SPECTRA PROC SPECTRA
DATA= OUT=
Output Control Options output the amplitudes of the cross-spectrum output the Fourier coefficients
PROC SPECTRA PROC SPECTRA
A COEF
PROC SPECTRA Statement F 1693
Table 25.1
continued
Description
Statement
Option
output the periodogram output the spectral density estimates output cross-spectral analysis results output squared coherency of the cross-spectrum output the phase of the cross-spectrum
PROC SPECTRA PROC SPECTRA PROC SPECTRA PROC SPECTRA PROC SPECTRA
P S CROSS K PH
Smoothing Options specify the Bartlett kernel specify the Parzen kernel specify the quadratic spectral kernel specify the Tukey-Hanning kernel specify the truncated kernel
WEIGHTS WEIGHTS WEIGHTS WEIGHTS WEIGHTS
BART PARZEN QS TUKEY TRUNCAT
Other Options subtract the series mean PROC SPECTRA specify an alternate quadrature spectrum esti- PROC SPECTRA mate request tests for white noise PROC SPECTRA
ADJMEAN ALTW WHITETEST
PROC SPECTRA Statement PROC SPECTRA options ;
The following options can be used in the PROC SPECTRA statement: A
outputs the amplitude variables (A_nn _mm ) of the cross-spectrum. ADJMEAN CENTER
subtracts the series mean before performing the Fourier decomposition. This sets the first periodogram ordinate to 0 rather than 2n times the squared mean. This option is commonly used when the periodograms are to be plotted to prevent a large first periodogram ordinate from distorting the scale of the plot. ALTW
specifies that the quadrature spectrum estimate is computed at the boundaries in the same way as the spectral density estimate and the cospectrum estimate are computed.
1694 F Chapter 25: The SPECTRA Procedure
COEF
outputs the Fourier cosine and sine coefficients of each series. CROSS
is used with the P and S options to output cross-periodograms and cross-spectral densities when more than one variable is listed in the VAR statement. DATA=SAS-data-set
names the SAS data set that contains the input data. If the DATA= option is omitted, the most recently created SAS data set is used. K
outputs the squared coherency variables (K_nn _mm ) of the cross-spectrum. The K_nn _mm variables are identically 1 unless weights are given in the WEIGHTS statement and the S option is specified. OUT=SAS-data-set
names the output data set created by PROC SPECTRA to store the results. If the OUT= option is omitted, the output data set is named by using the DATAn convention. P
outputs the periodogram variables. The variables are named P_nn, where nn is an index of the original variable with which the periodogram variable is associated. When both the P and CROSS options are specified, the cross-periodogram variables RP_nn_mm and IP_nn_mm are also output. PH
outputs the phase variables (PH_nn _mm) of the cross-spectrum. S
outputs the spectral density estimates. The variables are named S_nn, where nn is an index of the original variable with which the estimate variable is associated. When both the S and CROSS options are specified, the cross-spectral variables CS_nn _mm and QS_nn _mm are also output. WHITETEST
prints two tests of the hypothesis that the data are white noise. See the section “White Noise Test” on page 1699 for details. Note that the CROSS, A, K, and PH options are meaningful only if more than one variable is listed in the VAR statement.
BY Statement BY variables ;
A BY statement can be used with PROC SPECTRA to obtain separate analyses for groups of observations defined by the BY variables.
VAR Statement F 1695
VAR Statement VAR variables ;
The VAR statement specifies one or more numeric variables that contain the time series to analyze. The order of the variables in the VAR statement list determines the index, nn, used to name the output variables. The VAR statement is required.
WEIGHTS Statement WEIGHTS weight-constants | kernel-specification ;
The WEIGHTS statement specifies the relative weights used in the moving average applied to the periodogram ordinates to form the spectral density estimates. A WEIGHTS statement must be used to produce smoothed spectral density estimates. You can specify the relative weights in two ways: you can specify them explicitly as explained in the section “Using Weight Constants Specification” on page 1695, or you can specify them implicitly by using the kernel specification as explained in the section “Using Kernel Specifications” on page 1695. If the WEIGHTS statement is not used, only the periodogram is produced.
Using Weight Constants Specification Any number of weighting constants can be specified. The constants should be positive and symmetric about the middle weight. The middle constant (or the constant to the right of the middle if an even number of weight constants are specified) is the relative weight of the current periodogram ordinate. The constant immediately following the middle one is the relative weight of the next periodogram ordinate, and so on. The actual weights used in the smoothing process are the weights specified in 1 the WEIGHTS statement scaled so that they sum to 4 . The moving average reflects at each end of the periodogram. The first periodogram ordinate is not used; the second periodogram ordinate is used in its place. For example, a simple triangular weighting can be specified using the following WEIGHTS statement: weights 1 2 3 2 1;
Using Kernel Specifications You can specify five different kernels in the WEIGHTS statement. The syntax for the statement is WEIGHTS [PARZEN][BART][TUKEY][TRUNCAT][QS] [c e] ;
where c >D 0 and e >D 0 are used to compute the bandwidth parameter as l.q/ D cq e
1696 F Chapter 25: The SPECTRA Procedure
and q is the number of periodogram ordinates +1: q D floor.n=2/ C 1 To specify the bandwidth explicitly, set c D to the desired bandwidth and e D 0. For example, a Parzen kernel can be specified using the following WEIGHTS statement: weights parzen 0.5 0;
For details, see the section “Kernels” on page 1697.
Details: SPECTRA Procedure
Input Data Observations in the data set analyzed by the SPECTRA procedure should form ordered, equally spaced time series. No more than 99 variables can be included in the analysis. Data are often detrended before analysis by the SPECTRA procedure. This can be done by using the residuals output by a SAS regression procedure. Optionally, the data can be centered using the ADJMEAN option in the PROC SPECTRA statement, since the zero periodogram ordinate corresponding to the mean is of little interest from the point of view of spectral analysis.
Missing Values Missing values are excluded from the analysis by the SPECTRA procedure. If the SPECTRA procedure encounters missing values for any variable listed in the VAR statement, the procedure determines the longest contiguous span of data that has no missing values for the variables listed in the VAR statement and uses that span for the analysis.
Computational Method If the number of observations n factors into prime integers that are less than or equal to 23, and the product of the square-free factors of n is less than 210, then PROC SPECTRA uses the fast Fourier transform developed by Cooley and Tukey and implemented by Singleton (1969). If n cannot be factored in this way, then PROC SPECTRA uses a Chirp-Z algorithm similar to that proposed by Monro and Branch (1976). To reduce memory requirements, when n is small, the Fourier coefficients are computed directly using the defining formulas.
Kernels F 1697
Kernels Kernels are used to smooth the periodogram by using a weighted moving average of nearby points. A smoothed periodogram is defined by the following equation.
JOi .l.q// D
l.q/ X
D l.q/
w JQi C l.q/
where w.x/ is the kernel or weight function. At the endpoints, the moving average is computed cyclically; that is,
JQi C
8 ˆ <Ji C D J .i C/ ˆ : Jq .i C/
0 0 then output; xll = xl; xl = x; end; run;
Example 25.2: Cross-Spectral Analysis F 1711
proc spectra data=a out=b cross coef a k p ph s; var x y; weights 1 1.5 2 4 8 9 8 4 2 1.5 1; run; proc contents data=b position; run;
The PROC CONTENTS report for the output data set B is shown in Output 25.2.1. Output 25.2.1 Contents of PROC SPECTRA OUT= Data Set The CONTENTS Procedure Alphabetic List of Variables and Attributes #
Variable
Type
16 3 5 13 1 12 15 2 17 7 8 14 11 4 6 9 10
A_01_02 COS_01 COS_02 CS_01_02 FREQ IP_01_02 K_01_02 PERIOD PH_01_02 P_01 P_02 QS_01_02 RP_01_02 SIN_01 SIN_02 S_01 S_02
Num Num Num Num Num Num Num Num Num Num Num Num Num Num Num Num Num
Len 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
Label Amplitude of x by y Cosine Transform of x Cosine Transform of y Cospectra of x by y Frequency from 0 to PI Imag Periodogram of x by y Coherency**2 of x by y Period Phase of x by y Periodogram of x Periodogram of y Quadrature of x by y Real Periodogram of x by y Sine Transform of x Sine Transform of y Spectral Density of x Spectral Density of y
The following statements plot the amplitude of the cross-spectrum estimate against frequency and against period for periods less than 25. proc sgplot data=b; series x=freq y=a_01_02 / markers markerattrs=(symbol=circlefilled); xaxis values=(0 to 4 by 1); run;
The plot of the amplitude of the cross-spectrum estimate against frequency is shown in Output 25.2.2.
1712 F Chapter 25: The SPECTRA Procedure
Output 25.2.2 Plot of Cross-Spectrum Amplitude by Frequency
The plot of the cross-spectrum amplitude against period for periods less than 25 observations is shown in Output 25.2.3. proc sgplot data=b; where period < 25; series x=period y=a_01_02 / markers markerattrs=(symbol=circlefilled); xaxis values=(0 to 30 by 5); run;
References F 1713
Output 25.2.3 Plot of Cross-Spectrum Amplitude by Period
References Anderson, T. W. (1971), The Statistical Analysis of Time Series, New York: John Wiley & Sons. Andrews, D. W. K. (1991), “Heteroscedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” Econometrica, 59 (3), 817–858. Bartlett, M. S. (1966), An Introduction to Stochastic Processes, Second Edition, Cambridge: Cambridge University Press. Brillinger, D. R. (1975), Time Series: Data Analysis and Theory, New York: Holt, Rinehart and Winston, Inc. Davis, H. T. (1941), The Analysis of Economic Time Series, Bloomington, IN: Principia Press. Durbin, J. (1967), “Tests of Serial Independence Based on the Cumulated Periodogram,” Bulletin of Int. Stat. Inst., 42, 1039–1049.
1714 F Chapter 25: The SPECTRA Procedure
Fuller, W. A. (1976), Introduction to Statistical Time Series, New York: John Wiley & Sons. Gentleman, W. M. and Sande, G. (1966), “Fast Fourier Transforms–for Fun and Profit,” AFIPS Proceedings of the Fall Joint Computer Conference, 19, 563–578. Jenkins, G. M. and Watts, D. G. (1968), Spectral Analysis and Its Applications, San Francisco: Holden-Day. Miller, L. H. (1956), “Tables of Percentage Points of Kolmogorov Statistics,” Journal of American Statistical Association, 51, 111. Monro, D. M. and Branch, J. L. (1976), “Algorithm AS 117. The Chirp Discrete Fourier Transform of General Length,” Applied Statistics, 26, 351–361. Nussbaumer, H. J. (1982), Fast Fourier Transform and Convolution Algorithms, Second Edition, New York: Springer-Verlag. Owen, D. B. (1962), Handbook of Statistical Tables, Addison Wesley. Parzen, E. (1957), “On Consistent Estimates of the Spectrum of a Stationary Time Series,” Annals of Mathematical Statistics, 28, 329–348. Priestly, M. B. (1981), Spectral Analysis and Time Series, New York: Academic Press, Inc. Singleton, R. C. (1969), “An Algorithm for Computing the Mixed Radix Fast Fourier Transform,” I.E.E.E. Transactions of Audio and Electroacoustics, AU-17, 93–103.
Chapter 26
The STATESPACE Procedure Contents Overview: STATESPACE Procedure . . . . . . . . . The State Space Model . . . . . . . . . . . . . How PROC STATESPACE Works . . . . . . . Getting Started: STATESPACE Procedure . . . . . . Automatic State Space Model Selection . . . . Specifying the State Space Model . . . . . . . Syntax: STATESPACE Procedure . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . PROC STATESPACE Statement . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . FORM Statement . . . . . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . . . . INITIAL Statement . . . . . . . . . . . . . . RESTRICT Statement . . . . . . . . . . . . . VAR Statement . . . . . . . . . . . . . . . . . Details: STATESPACE Procedure . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . Stationarity and Differencing . . . . . . . . . Preliminary Autoregressive Models . . . . . . Canonical Correlation Analysis . . . . . . . . Parameter Estimation . . . . . . . . . . . . . Forecasting . . . . . . . . . . . . . . . . . . . Relation of ARMA and State Space Forms . . OUT= Data Set . . . . . . . . . . . . . . . . . OUTAR= Data Set . . . . . . . . . . . . . . . OUTMODEL= Data Set . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . Examples: STATESPACE Procedure . . . . . . . . . Example 26.1: Series J from Box and Jenkins References . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
1716 1716 1717 1718 1719 1726 1728 1729 1730 1734 1734 1734 1735 1735 1735 1736 1736 1736 1738 1741 1744 1745 1747 1749 1749 1750 1751 1752 1753 1753 1758
1716 F Chapter 26: The STATESPACE Procedure
Overview: STATESPACE Procedure The STATESPACE procedure uses the state space model to analyze and forecast multivariate time series. The STATESPACE procedure is appropriate for jointly forecasting several related time series that have dynamic interactions. By taking into account the autocorrelations among all the variables in a set, the STATESPACE procedure can give better forecasts than methods that model each series separately. By default, the STATESPACE procedure automatically selects a state space model appropriate for the time series, making the procedure a good tool for automatic forecasting of multivariate time series. Alternatively, you can specify the state space model by giving the form of the state vector and the state transition and innovation matrices. The methods used by the STATESPACE procedure assume that the time series are jointly stationary. Nonstationary series must be made stationary by some preliminary transformation, usually by differencing. The STATESPACE procedure enables you to specify differencing of the input data. When differencing is specified, the STATESPACE procedure automatically integrates forecasts of the differenced series to produce forecasts of the original series.
The State Space Model The state space model represents a multivariate time series through auxiliary variables, some of which might not be directly observable. These auxiliary variables are called the state vector. The state vector summarizes all the information from the present and past values of the time series that is relevant to the prediction of future values of the series. The observed time series are expressed as linear combinations of the state variables. The state space model is also called a Markovian representation, or a canonical representation, of a multivariate time series process. The state space approach to modeling a multivariate stationary time series is summarized in Akaike (1976). The state space form encompasses a very rich class of models. Any Gaussian multivariate stationary time series can be written in a state space form, provided that the dimension of the predictor space is finite. In particular, any autoregressive moving average (ARMA) process has a state space representation and, conversely, any state space process can be expressed in an ARMA form (Akaike 1974). More details on the relation of the state space and ARMA forms are given in the section “Relation of ARMA and State Space Forms” on page 1747. Let xt be the r 1 vector of observed variables, after differencing (if differencing is specified) and subtracting the sample mean. Let zt be the state vector of dimension s, s r, where the first r components of zt consist of xt . Let the notation xt Ckjt represent the conditional expectation (or prediction) of xtCk based on the information available at time t. Then the last s r elements of zt consist of elements of x t Ckjt , where k >0 is specified or determined automatically by the procedure. There are various forms of the state space model in use. The form of the state space model used by the STATESPACE procedure is based on Akaike (1976). The model is defined by the following state
How PROC STATESPACE Works F 1717
transition equation : zt C1 D Fzt C Get C1 In the state transition equation, the s s coefficient matrix F is called the transition matrix; it determines the dynamic properties of the model. The s r coefficient matrix G is called the input matrix; it determines the variance structure of the transition equation. For model identification, the first r rows and columns of G are set to an r r identity matrix. The input vector e t is a sequence of independent normally distributed random vectors of dimension r with mean 0 and covariance matrix †ee . The random error e t is sometimes called the innovation vector or shock vector. In addition to the state transition equation, state space models usually include a measurement equation or observation equation that gives the observed values xt as a function of the state vector zt . However, since PROC STATESPACE always includes the observed values xt in the state vector zt , the measurement equation in this case merely represents the extraction of the first r components of the state vector. The measurement equation used by the STATESPACE procedure is xt D ŒIr 0zt where Ir is an r r identity matrix. In practice, PROC STATESPACE performs the extraction of xt from zt without reference to an explicit measurement equation. In summary: xt
is an observation vector of dimension r.
zt
is a state vector of dimension s, whose first r elements are x t and whose last s r elements are conditional prediction of future x t .
F
is an ss transition matrix.
G
is an sr input matrix, with the identity matrix I r forming the first r rows and columns.
et
is a sequence of independent normally distributed random vectors of dimension r with mean 0 and covariance matrix †ee .
How PROC STATESPACE Works The design of the STATESPACE procedure closely follows the modeling strategy proposed by Akaike (1976). This strategy employs canonical correlation analysis for the automatic identification of the state space model. Following Akaike (1976), the procedure first fits a sequence of unrestricted vector autoregressive (VAR) models and computes Akaike’s information criterion (AIC) for each model. The vector
1718 F Chapter 26: The STATESPACE Procedure
autoregressive models are estimated using the sample autocovariance matrices and the Yule-Walker equations. The order of the VAR model that produces the smallest Akaike information criterion is chosen as the order (number of lags into the past) to use in the canonical correlation analysis. The elements of the state vector are then determined via a sequence of canonical correlation analyses of the sample autocovariance matrices through the selected order. This analysis computes the sample canonical correlations of the past with an increasing number of steps into the future. Variables that yield significant correlations are added to the state vector; those that yield insignificant correlations are excluded from further consideration. The importance of the correlation is judged on the basis of another information criterion proposed by Akaike. See the section “Canonical Correlation Analysis Options” on page 1731 for details. If you specify the state vector explicitly, these model identification steps are omitted. After the state vector is determined, the state space model is fit to the data. The free parameters in the F, G, and †ee matrices are estimated by approximate maximum likelihood. By default, the F and G matrices are unrestricted, except for identifiability requirements. Optionally, conditional least squares estimates can be computed. You can impose restrictions on elements of the F and G matrices. After the parameters are estimated, the Kalman filtering technique is used to produce forecasts from the fitted state space model. If differencing was specified, the forecasts are integrated to produce forecasts of the original input variables.
Getting Started: STATESPACE Procedure The following introductory example uses simulated data for two variables X and Y. The following statements generate the X and Y series. data in; x=10; y=40; x1=0; y1=0; a1=0; b1=0; iseed=123; do t=-100 to 200; a=rannor(iseed); b=rannor(iseed); dx = 0.5*x1 + 0.3*y1 + a - 0.2*a1 - 0.1*b1; dy = 0.3*x1 + 0.5*y1 + b; x = x + dx + .25; y = y + dy + .25; if t >= 0 then output; x1 = dx; y1 = dy; a1 = a; b1 = b; end; keep t x y; run;
The simulated series X and Y are shown in Figure 26.1.
Automatic State Space Model Selection F 1719
Figure 26.1 Example Series
Automatic State Space Model Selection The STATESPACE procedure is designed to automatically select the best state space model for forecasting the series. You can specify your own model if you want, and you can use the output from PROC STATESPACE to help you identify a state space model. However, the easiest way to use PROC STATESPACE is to let it choose the model.
Stationarity and Differencing Although PROC STATESPACE selects the state space model automatically, it does assume that the input series are stationary. If the series are nonstationary, then the process might fail. Therefore the first step is to examine your data and test to see if differencing is required. (See the section “Stationarity and Differencing” on page 1736 for further discussion of this issue.) The series shown in Figure 26.1 are nonstationary. In order to forecast X and Y with a state space model, you must difference them (or use some other detrending method). If you fail to difference
1720 F Chapter 26: The STATESPACE Procedure
when needed and try to use PROC STATESPACE with nonstationary data, an inappropriate state space model might be selected, and the model estimation might fail to converge. The following statements identify and fit a state space model for the first differences of X and Y, and forecast X and Y 10 periods ahead: proc statespace data=in out=out lead=10; var x(1) y(1); id t; run;
The DATA= option specifies the input data set and the OUT= option specifies the output data set for the forecasts. The LEAD= option specifies forecasting 10 observations past the end of the input data. The VAR statement specifies the variables to forecast and specifies differencing. The notation X(1) Y(1) specifies that the state space model analyzes the first differences of X and Y.
Descriptive Statistics and Preliminary Autoregressions The first page of the printed output produced by the preceding statements is shown in Figure 26.2. Figure 26.2 Descriptive Statistics and VAR Order Selection The STATESPACE Procedure Number of Observations
Mean
Standard Error
x
0.144316
1.233457
y
0.164871
1.304358
Variable
200
Has been differenced. With period(s) = 1. Has been differenced. With period(s) = 1.
The STATESPACE Procedure Information Criterion for Autoregressive Models Lag=0
Lag=1
Lag=2
Lag=3
Lag=4
Lag=5
Lag=6
Lag=7
Lag=8
149.697 8.387786 5.517099 12.05986 15.36952 21.79538 24.00638 29.88874 33.55708 Information Criterion for Autoregressive Models Lag=9
Lag=10
41.17606
47.70222
Automatic State Space Model Selection F 1721
Figure 26.2 continued Schematic Representation of Correlations Name/Lag x y
0
1
2
3
4
5
6
7
8
9
10
++ ++
++ ++
++ ++
++ ++
++ ++
++ +.
+. +.
.. +.
+. +.
+. ..
.. ..
+ is > 2*std error,
- is < -2*std error,
. is between
Descriptive statistics are printed first, giving the number of nonmissing observations after differencing and the sample means and standard deviations of the differenced series. The sample means are subtracted before the series are modeled (unless the NOCENTER option is specified), and the sample means are added back when the forecasts are produced. Let Xt and Yt be the observed values of X and Y, and let xt and yt be the values of X and Y after differencing and subtracting the mean difference. The series xt modeled by the STATEPSPACE procedure is x .1 xt D t D yt .1
B/Xt B/Yt
0:144316 0:164871
where B represents the backshift operator. After the descriptive statistics, PROC STATESPACE prints the Akaike information criterion (AIC) values for the autoregressive models fit to the series. The smallest AIC value, in this case 5.517 at lag 2, determines the number of autocovariance matrices analyzed in the canonical correlation phase. A schematic representation of the autocorrelations is printed next. This indicates which elements of the autocorrelation matrices at different lags are significantly greater than or less than 0. The second page of the STATESPACE printed output is shown in Figure 26.3. Figure 26.3 Partial Autocorrelations and VAR Model Schematic Representation of Partial Autocorrelations Name/Lag x y
1
2
3
4
5
6
7
8
9
10
++ ++
+. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
+ is > 2*std error,
- is < -2*std error,
. is between
Yule-Walker Estimates for Minimum AIC
x y
--------Lag=1------x y
--------Lag=2------x y
0.257438 0.292177
0.170812 -0.00537
0.202237 0.469297
0.133554 -0.00048
1722 F Chapter 26: The STATESPACE Procedure
Figure 26.3 shows a schematic representation of the partial autocorrelations, similar to the autocorrelations shown in Figure 26.2. The selection of a second order autoregressive model by the AIC statistic looks reasonable in this case because the partial autocorrelations for lags greater than 2 are not significant. Next, the Yule-Walker estimates for the selected autoregressive model are printed. This output shows the coefficient matrices of the vector autoregressive model at each lag.
Selected State Space Model Form and Preliminary Estimates After the autoregressive order selection process has determined the number of lags to consider, the canonical correlation analysis phase selects the state vector. By default, output for this process is not printed. You can use the CANCORR option to print details of the canonical correlation analysis. See the section “Canonical Correlation Analysis Options” on page 1731 for an explanation of this process. After the state vector is selected, the state space model is estimated by approximate maximum likelihood. Information from the canonical correlation analysis and from the preliminary autoregression is used to form preliminary estimates of the state space model parameters. These preliminary estimates are used as starting values for the iterative estimation process. The form of the state vector and the preliminary estimates are printed next, as shown in Figure 26.4. Figure 26.4 Preliminary Estimates of State Space Model The STATESPACE Procedure Selected Statespace Form and Preliminary Estimates State Vector x(T;T)
y(T;T)
x(T+1;T)
Estimate of Transition Matrix 0 0.291536 0.24869
0 0.468762 0.24484
1 -0.00411 0.204257
Input Matrix for Innovation 1 0 0.257438
0 1 0.202237
Variance Matrix for Innovation 0.945196 0.100786
0.100786 1.014703
Automatic State Space Model Selection F 1723
Figure 26.4 first prints the state vector as X[T;T] Y[T;T] X[T+1;T]. This notation indicates that the state vector is 2
3 xt jt zt D 4 yt jt 5 xt C1jt The notation xt C1jt indicates the conditional expectation or prediction of xt C1 based on the information available at time t, and xt jt and yt jt are xt and yt , respectively. The remainder of Figure 26.4 shows the preliminary estimates of the transition matrix F, the input matrix G, and the covariance matrix †ee .
Estimated State Space Model The next page of the STATESPACE output prints the final estimates of the fitted model, as shown in Figure 26.5. This output has the same form as in Figure 26.4, but it shows the maximum likelihood estimates instead of the preliminary estimates. Figure 26.5 Fitted State Space Model The STATESPACE Procedure Selected Statespace Form and Fitted Model State Vector x(T;T)
y(T;T)
x(T+1;T)
Estimate of Transition Matrix 0 0.297273 0.2301
0 0.47376 0.228425
1 -0.01998 0.256031
Input Matrix for Innovation 1 0 0.257284
0 1 0.202273
Variance Matrix for Innovation 0.945188 0.100752
0.100752 1.014712
1724 F Chapter 26: The STATESPACE Procedure
The estimated state space model shown in Figure 26.5 is 2
2 3 xt C1jt C1 0 4yt C1jt C1 5 D 40:297 0:230 xt C2jt C1 e 0:945 var t C1 D nt C1 0:101
3 32 3 2 1 0 xt 0 1 e t C1 1 5 0:474 0:0205 4 yt 5 C 4 0 ntC1 0:257 0:202 xt C1jt 0:228 0:256 0:101 1:015
The next page of the STATESPACE output lists the estimates of the free parameters in the F and G matrices with standard errors and t statistics, as shown in Figure 26.6. Figure 26.6 Final Parameter Estimates Parameter Estimates
Parameter
Estimate
Standard Error
t Value
F(2,1) F(2,2) F(2,3) F(3,1) F(3,2) F(3,3) G(3,1) G(3,2)
0.297273 0.473760 -0.01998 0.230100 0.228425 0.256031 0.257284 0.202273
0.129995 0.115688 0.313025 0.126226 0.112978 0.305256 0.071060 0.068593
2.29 4.10 -0.06 1.82 2.02 0.84 3.62 2.95
Convergence Failures The maximum likelihood estimates are computed by an iterative nonlinear maximization algorithm, which might not converge. If the estimates fail to converge, warning messages are printed in the output. If you encounter convergence problems, you should recheck the stationarity of the data and ensure that the specified differencing orders are correct. Attempting to fit state space models to nonstationary data is a common cause of convergence failure. You can also use the MAXIT= option to increase the number of iterations allowed, or experiment with the convergence tolerance options DETTOL= and PARMTOL=.
Forecast Data Set The following statements print the output data set. The WHERE statement excludes the first 190 observations from the output, so that only the forecasts and the last 10 actual observations are printed. proc print data=out; id t; where t > 190; run;
Automatic State Space Model Selection F 1725
The PROC PRINT output is shown in Figure 26.7. Figure 26.7 OUT= Data Set Produced by PROC STATESPACE t
x
FOR1
RES1
STD1
y
FOR2
RES2
STD2
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
34.8159 35.0656 34.7034 34.6626 34.4055 33.8210 34.0164 35.3819 36.2954 37.8945 . . . . . . . . . .
33.6299 35.6598 35.5530 34.7597 34.8322 34.6053 33.6230 33.6251 36.0528 37.1431 38.5068 39.0428 39.4619 39.8284 40.1474 40.4310 40.6861 40.9185 41.1330 41.3332
1.18600 -0.59419 -0.84962 -0.09707 -0.42664 -0.78434 0.39333 1.75684 0.24256 0.75142 . . . . . . . . . .
0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 0.97221 1.59125 2.28028 2.97824 3.67689 4.36299 5.03040 5.67548 6.29673 6.89383
58.7189 58.5440 59.0476 59.7774 60.5118 59.8750 58.4698 60.6782 60.9692 60.8586 . . . . . . . . . .
57.9916 59.7718 58.5723 59.2241 60.1544 60.8260 59.4502 57.9167 62.1637 61.4085 61.3161 61.7509 62.1546 62.5099 62.8275 63.1139 63.3755 63.6174 63.8435 64.0572
0.72728 -1.22780 0.47522 0.55330 0.35738 -0.95102 -0.98046 2.76150 -1.19450 -0.54984 . . . . . . . . . .
1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.00733 1.83678 2.62366 3.38839 4.12805 4.84149 5.52744 6.18564 6.81655 7.42114
The OUT= data set produced by PROC STATESPACE contains the VAR and ID statement variables. In addition, for each VAR statement variable, the OUT= data set contains the variables FORi, RESi, and STDi. These variables contain the predicted values, residuals, and forecast standard errors for the ith variable in the VAR statement list. In this case, X is listed first in the VAR statement, so FOR1 contains the forecasts of X, while FOR2 contains the forecasts of Y. The following statements plot the forecasts and actuals for the series. proc sgplot data=out noautolegend; where t > 150; series x=t y=for1 / markers markerattrs=(symbol=circle color=blue) lineattrs=(pattern=solid color=blue); series x=t y=for2 / markers markerattrs=(symbol=circle color=blue) lineattrs=(pattern=solid color=blue); series x=t y=x / markers markerattrs=(symbol=circle color=red) lineattrs=(pattern=solid color=red); series x=t y=y / markers markerattrs=(symbol=circle color=red) lineattrs=(pattern=solid color=red); refline 200.5 / axis=x; run;
1726 F Chapter 26: The STATESPACE Procedure
The forecast plot is shown in Figure 26.8. The last 50 observations are also plotted to provide context, and a reference line is drawn between the historical and forecast periods. Figure 26.8 Plot of Forecasts
Controlling Printed Output By default, the STATESPACE procedure produces a large amount of printed output. The NOPRINT option suppresses all printed output. You can suppress the printed output for the autoregressive model selection process with the PRINTOUT=NONE option. The descriptive statistics and state space model estimation output are still printed when PRINTOUT=NONE is specified. You can produce more detailed output with the PRINTOUT=LONG option and by specifying the printing control options CANCORR, COVB, and PRINT.
Specifying the State Space Model Instead of allowing the STATESPACE procedure to select the model automatically, you can use FORM and RESTRICT statements to specify a state space model.
Specifying the State Space Model F 1727
Specifying the State Vector Use the FORM statement to control the form of the state vector. You can use this feature to force PROC STATESPACE to estimate and forecast a model different from the model it would select automatically. You can also use this feature to reestimate the automatically selected model (possibly with restrictions) without repeating the canonical correlation analysis. The FORM statement specifies the number of lags of each variable to include in the state vector. For example, the statement FORM X 3; forces the state vector to include xt jt , xt C1jt , and xt C2jt . The following statement specifies the state vector .xt jt ; yt jt ; xt C1jt /, which is the same state vector selected in the preceding example: form x 2 y 1;
You can specify the form for only some of the variables and allow PROC STATESPACE to select the form for the other variables. If only some of the variables are specified in the FORM statement, canonical correlation analysis is used to determine the number of lags included in the state vector for the remaining variables not specified by the FORM statement. If the FORM statement includes specifications for all the variables listed in the VAR statement, the state vector is completely defined and the canonical correlation analysis is not performed.
Restricting the F and G matrices After you know the form of the state vector, you can use the RESTRICT statement to fix some parameters in the F and G matrices to specified values. One use of this feature is to remove insignificant parameters by restricting them to 0. In the introductory example shown in the preceding section, the F[2,3] parameter is not significant. (The parameters estimation output shown in Figure 26.6 gives the t statistic for F[2,3] as –0.06. F[3,3] and F[3,1] also have low significance with t < 2.) The following statements reestimate this model with F[2,3] restricted to 0. The FORM statement is used to specify the state vector and thus bypass the canonical correlation analysis. proc statespace data=in out=out lead=10; var x(1) y(1); id t; form x 2 y 1; restrict f(2,3)=0; run;
The final estimates produced by these statements are shown in Figure 26.10.
1728 F Chapter 26: The STATESPACE Procedure
Figure 26.9 Results Using RESTRICT Statement The STATESPACE Procedure Selected Statespace Form and Fitted Model State Vector x(T;T)
y(T;T)
x(T+1;T)
Estimate of Transition Matrix 0 0.290051 0.227051
0 0.467468 0.226139
1 0 0.26436
Input Matrix for Innovation 1 0 0.256826
0 1 0.202022
Variance Matrix for Innovation 0.945175 0.100696
0.100696 1.014733
Figure 26.10 Restricted Parameter Estiamtes Parameter Estimates
Parameter
Estimate
Standard Error
t Value
F(2,1) F(2,2) F(3,1) F(3,2) F(3,3) G(3,1) G(3,2)
0.290051 0.467468 0.227051 0.226139 0.264360 0.256826 0.202022
0.063904 0.060430 0.125221 0.111711 0.299537 0.070994 0.068507
4.54 7.74 1.81 2.02 0.88 3.62 2.95
Syntax: STATESPACE Procedure The STATESPACE procedure uses the following statements:
Functional Summary F 1729
PROC STATESPACE options ; BY variable . . . ; FORM variable value . . . ; ID variable ; INITIAL F (row,column)=value . . . G(row,column)=value . . . ; RESTRICT F (row,column)=value . . . G (row,column)=value . . . ; VAR variable (difference, difference, . . . ) . . . ;
Functional Summary Table 26.1 summarizes the statements and options used by PROC STATESPACE. Table 26.1
STATESPACE Functional Summary
Description
Statement
Option
Input Data Set Options specify the input data set prevent subtraction of sample mean specify the ID variable specify the observed series and differencing
PROC STATESPACE PROC STATESPACE ID VAR
DATA= NOCENTER
Options for Autoregressive Estimates specify the maximum order specify maximum lag for autocovariances output only minimum AIC model specify the amount of detail printed write preliminary AR models to a data set
PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE
ARMAX= LAGMAX= MINIC PRINTOUT= OUTAR=
Options for Canonical Correlation Analysis print the sequence of canonical correlations specify upper limit of dimension of state vector specify the minimum number of lags specify the multiplier of the degrees of freedom
PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE
CANCORR DIMMAX= PASTMIN= SIGCORR=
INITIAL PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE
COVB DETTOL= PARMTOL= ITPRINT KLAG= MAXIT= NOEST OUTMODEL=
PROC STATESPACE
RESIDEST
Options for State Space Model Estimation specify starting values print covariance matrix of parameter estimates specify the convergence criterion specify the convergence criterion print the details of the iterations specify an upper limit of the number of lags specify maximum number of iterations allowed suppress the final estimation write the state space model parameter estimates to an output data set use conditional least squares for final estimates
1730 F Chapter 26: The STATESPACE Procedure
Description
Statement
Option
specify criterion for testing for singularity
PROC STATESPACE
SINGULAR=
Options for Forecasting start forecasting before end of the input data specify the time interval between observations specify multiple periods in the time series specify how many periods to forecast specify the output data set for forecasts print forecasts
PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE PROC STATESPACE
BACK= INTERVAL= INTPER= LEAD= OUT= PRINT
Options to Specify the State Space Model specify the state vector specify the parameter values
FORM RESTRICT
BY Groups specify BY-group processing
BY
Printing suppresses all printed output
NOPRINT
PROC STATESPACE Statement PROC STATESPACE options ;
The following options can be specified in the PROC STATESPACE statement.
Printing Options NOPRINT
suppresses all printed output.
Input Data Options DATA=SAS-data-set
specifies the name of the SAS data set to be used by the procedure. If the DATA= option is omitted, the most recently created SAS data set is used. LAGMAX=k
specifies the number of lags for which the sample autocovariance matrix is computed. The LAGMAX= option controls the number of lags printed in the schematic representation of the autocorrelations.
PROC STATESPACE Statement F 1731
The sample autocovariance matrix of lag i, denoted as Ci , is computed as Ci D
N X
1 N
1
xt x0t
i
t D1Ci
where xt is the differenced and centered data and N is the number of observations. (If the NOCENTER option is specified, 1 is not subtracted from N .) LAGMAX= k specifies that C0 through Ck are computed. The default is LAGMAX=10. NOCENTER
prevents subtraction of the sample mean from the input series (after any specified differencing) before the analysis.
Options for Preliminary Autoregressive Models ARMAX=n
specifies the maximum order of the preliminary autoregressive models. The ARMAX= option controls the autoregressive orders for which information criteria are printed, and controls the number of lags printed in the schematic representation of partial autocorrelations. The default is ARMAX=10. See the section “Preliminary Autoregressive Models” on page 1738 for details. MINIC
writes to the OUTAR= data set only the preliminary Yule-Walker estimates for the VAR model that produces the minimum AIC. See the section “OUTAR= Data Set” on page 1749 for details. OUTAR=SAS-data-set
writes the Yule-Walker estimates of the preliminary autoregressive models to a SAS data set. See the section “OUTAR= Data Set” on page 1749 for details. PRINTOUT=SHORT | LONG | NONE
determines the amount of detail printed. PRINTOUT=LONG prints the lagged covariance matrices, the partial autoregressive matrices, and estimates of the residual covariance matrices from the sequence of autoregressive models. PRINTOUT=NONE suppresses the output for the preliminary autoregressive models. The descriptive statistics and state space model estimation output are still printed when PRINTOUT=NONE is specified. PRINTOUT=SHORT is the default.
Canonical Correlation Analysis Options CANCORR
prints the canonical correlations and information criterion for each candidate state vector considered. See the section “Canonical Correlation Analysis Options” on page 1731 for details.
1732 F Chapter 26: The STATESPACE Procedure
DIMMAX=n
specifies the upper limit to the dimension of the state vector. The DIMMAX= option can be used to limit the size of the model selected. The default is DIMMAX=10. PASTMIN=n
specifies the minimum number of lags to include in the canonical correlation analysis. The default is PASTMIN=0. See the section “Canonical Correlation Analysis Options” on page 1731 for details. SIGCORR=value
specifies the multiplier of the degrees of freedom for the penalty term in the information criterion used to select the state space form. The default is SIGCORR=2. The larger the value of the SIGCORR= option, the smaller the state vector tends to be. Hence, a large value causes a simpler model to be fit. See the section “Canonical Correlation Analysis Options” on page 1731 for details.
State Space Model Estimation Options COVB
prints the inverse of the observed information matrix for the parameter estimates. This matrix is an estimate of the covariance matrix for the parameter estimates. DETTOL=value
specifies the convergence criterion. The DETTOL= and PARMTOL= option values are used together to test for convergence of the estimation process. If, during an iteration, the relative change of the parameter estimates is less than the PARMTOL= value and the relative change of the determinant of the innovation variance matrix is less than the DETTOL= value, then iteration ceases and the current estimates are accepted. The default is DETTOL=1E–5. ITPRINT
prints the iterations during the estimation process. KLAG=n
sets an upper limit for the number of lags of the sample autocovariance matrix used in computing the approximate likelihood function. If the data have a strong moving average character, a larger KLAG= value might be necessary to obtain good estimates. The default is KLAG=15. See the section “Parameter Estimation” on page 1744 for details. MAXIT=n
sets an upper limit to the number of iterations in the maximum likelihood or conditional least squares estimation. The default is MAXIT=50. NOEST
suppresses the final maximum likelihood estimation of the selected model. OUTMODEL=SAS-data-set
writes the parameter estimates and their standard errors to a SAS data set. See the section “OUTMODEL= Data Set” on page 1750 for details.
PROC STATESPACE Statement F 1733
PARMTOL=value
specifies the convergence criterion. The DETTOL= and PARMTOL= option values are used together to test for convergence of the estimation process. If, during an iteration, the relative change of the parameter estimates is less than the PARMTOL= value and the relative change of the determinant of the innovation variance matrix is less than the DETTOL= value, then iteration ceases and the current estimates are accepted. The default is PARMTOL=0.001. RESIDEST
computes the final estimates by using conditional least squares on the raw data. This type of estimation might be more stable than the default maximum likelihood method but is usually more computationally expensive. See the section “Parameter Estimation” on page 1744 for details about the conditional least squares method. SINGULAR=value
specifies the criterion for testing for singularity of a matrix. A matrix is declared singular if a scaled pivot is less than the SINGULAR= value when sweeping the matrix. The default is SINGULAR=1E–7.
Forecasting Options BACK=n
starts forecasting n periods before the end of the input data. The BACK= option value must not be greater than the number of observations. The default is BACK=0. INTERVAL=interval
specifies the time interval between observations. The INTERVAL= value is used in conjunction with the ID variable to check that the input data are in order and have no missing periods. The INTERVAL= option is also used to extrapolate the ID values past the end of the input data. See Chapter 4, “Date Intervals, Formats, and Functions,” for details about the INTERVAL= values allowed. INTPER=n
specifies that each input observation corresponds to n time periods. For example, the options INTERVAL=MONTH and INTPER=2 specify bimonthly data and are equivalent to specifying INTERVAL=MONTH2. If the INTERVAL= option is not specified, the INTPER= option controls the increment used to generate ID values for the forecast observations. The default is INTPER=1. LEAD=n
specifies how many forecast observations are produced. The forecasts start at the point set by the BACK= option. The default is LEAD=0, which produces no forecasts. OUT=SAS-data-set
writes the residuals, actual values, forecasts, and forecast standard errors to a SAS data set. See the section “OUT= Data Set” on page 1749 for details. PRINT
prints the forecasts.
1734 F Chapter 26: The STATESPACE Procedure
BY Statement BY variable . . . ;
A BY statement can be used with the STATESPACE procedure to obtain separate analyses on observations in groups defined by the BY variables.
FORM Statement FORM variable value . . . ;
The FORM statement specifies the number of times a variable is included in the state vector. Values can be specified for any variable listed in the VAR statement. If a value is specified for each variable in the VAR statement, the state vector for the state space model is entirely specified, and automatic selection of the state space model is not performed. The FORM statement forces the state vector, zt , to contain a specific variable a given number of times. For example, if Y is one of the variables in xt , then the statement form y 3;
forces the state vector to contain Yt ; YtC1jt , and Yt C2jt , possibly along with other variables. The following statements illustrate the use of the FORM statement: proc statespace data=in; var x y; form x 3 y 2; run;
These statements fit a state space model with the following state vector: 2 3 xt jt 6 yt jt 7 6 7 7 zt D 6 6xt C1jt 7 4ytC1jt 5 xt C2jt
ID Statement ID variable ;
The ID statement specifies a variable that identifies observations in the input data set. The variable specified in the ID statement is included in the OUT= data set. The values of the ID variable are
INITIAL Statement F 1735
extrapolated for the forecast observations based on the values of the INTERVAL= and INTPER= options.
INITIAL Statement INITIAL
F (row,column)= value . . . G(row, column)= value . . . ;
The INITIAL statement gives initial values to the specified elements of the F and G matrices. These initial values are used as starting values for the iterative estimation. Parts of the F and G matrices represent fixed structural identities. If an element specified is a fixed structural element instead of a free parameter, the corresponding initialization is ignored. The following is an example of an INITIAL statement: initial f(3,2)=0 g(4,1)=0 g(5,1)=0;
RESTRICT Statement RESTRICT F(row,column)= value . . . G(row,column)= value . . . ;
The RESTRICT statement restricts the specified elements of the F and G matrices to the specified values. To use the restrict statement, you need to know the form of the model. Either specify the form of the model with the FORM statement, or do a preliminary run (perhaps with the NOEST option) to find the form of the model that PROC STATESPACE selects for the data. The following is an example of a RESTRICT statement: restrict f(3,2)=0 g(4,1)=0 g(5,1)=0 ;
Parts of the F and G matrices represent fixed structural identities. If a restriction is specified for an element that is a fixed structural element instead of a free parameter, the restriction is ignored.
VAR Statement VAR variable (difference, difference, . . . ) . . . ;
The VAR statement specifies the variables in the input data set to model and forecast. The VAR statement also specifies differencing of the input variables. The VAR statement is required.
1736 F Chapter 26: The STATESPACE Procedure
Differencing is specified by following the variable name with a list of difference periods separated by commas. See the section “Stationarity and Differencing” on page 1736 for more information about differencing of input variables. The order in which variables are listed in the VAR statement controls the order in which variables are included in the state vector. Usually, potential inputs should be listed before potential outputs. For example, assuming the input data are monthly, the following VAR statement specifies modeling and forecasting of the one period and seasonal second difference of X and Y: var x(1,12) y(1,12);
In this example, the vector time series analyzed is .1 B/.1 B 12 /Xt x xt D .1 B/.1 B 12 /Yt y where B represents the back shift operator and x and y represent the means of the differenced series. If the NOCENTER option is specified, the mean differences are not subtracted.
Details: STATESPACE Procedure
Missing Values The STATESPACE procedure does not support missing values. The procedure uses the first contiguous group of observations with no missing values for any of the VAR statement variables. Observations at the beginning of the data set with missing values for any VAR statement variable are not used or included in the output data set.
Stationarity and Differencing The state space model used by the STATESPACE procedure assumes that the time series are stationary. Hence, the data should be checked for stationarity. One way to check for stationarity is to plot the series. A graph of series over time can show a time trend or variability changes. You can also check stationarity by using the sample autocorrelation functions displayed by the ARIMA procedure. The autocorrelation functions of nonstationary series tend to decay slowly. See Chapter 7, “The ARIMA Procedure,” for more information. Another alternative is to use the STATIONARITY= option in the IDENTIFY statement in PROC ARIMA to apply Dickey-Fuller tests for unit roots in the time series. See Chapter 7, “The ARIMA Procedure,” for more information about Dickey-Fuller unit root tests.
Stationarity and Differencing F 1737
The most popular way to transform a nonstationary series to stationarity is by differencing. Differencing of the time series is specified in the VAR statement. For example, to take a simple first difference of the series X, use this statement: var x(1);
In this example, the change in X from one period to the next is analyzed. When the series has a seasonal pattern, differencing at a period equal to the length of the seasonal cycle can be desirable. For example, suppose the variable X is measured quarterly and shows a seasonal cycle over the year. You can use the following statement to analyze the series of changes from the same quarter in the previous year: var x(4);
To difference twice, add another differencing period to the list. For example, the following statement analyzes the series of second differences .Xt Xt 1 / .Xt 1 Xt 2 / D Xt 2Xt 1 C Xt 2 : var x(1,1);
The following statement analyzes the seasonal second difference series: var x(1,4);
The series that is being modeled is the 1-period difference of the 4-period difference: .Xt Xt 4 / .Xt 1 Xt 5 / D Xt Xt 1 Xt 4 C Xt 5 . Another way to obtain stationary series is to use a regression on time to detrend the data. If the time series has a deterministic linear trend, regressing the series on time produces residuals that should be stationary. The following statements write residuals of X and Y to the variable RX and RY in the output data set DETREND. data a; set a; t=_n_; run; proc reg data=a; model x y = t; output out=detrend r=rx ry; run;
You then use PROC STATESPACE to forecast the detrended series RX and RY. A disadvantage of this method is that you need to add the trend back to the forecast series in an additional step. A more serious disadvantage of the detrending method is that it assumes a deterministic trend. In practice, most time series appear to have a stochastic rather than a deterministic trend. Differencing is a more flexible and often more appropriate method.
1738 F Chapter 26: The STATESPACE Procedure
There are several other methods to handle nonstationary time series. For more information and examples, see Brockwell and Davis (1991).
Preliminary Autoregressive Models After computing the sample autocovariance matrices, PROC STATESPACE fits a sequence of vector autoregressive models. These preliminary autoregressive models are used to estimate the autoregressive order of the process and limit the order of the autocovariances considered in the state vector selection process.
Yule-Walker Equations for Forward and Backward Models Unlike a univariate autoregressive model, a multivariate autoregressive model has different forms, depending on whether the present observation is being predicted from the past observations or from the future observations. Let xt be the r-component stationary time series given by the VAR statement after differencing and subtracting the vector of sample means. (If the NOCENTER option is specified, the mean is not subtracted.) Let n be the number of observations of xt from the input data set. Let et be a vector white noise sequence with mean vector 0 and variance matrix †p , and let nt be a vector white noise sequence with mean vector 0 and variance matrix p . Let p be the order of the vector autoregressive model for xt . The forward autoregressive form based on the past observations is written as follows: xt D
p X
p
ˆi xt
i
C et
i D1
The backward autoregressive form based on the future observations is written as follows: xt D
p X
p
‰i xtCi C nt
i D1
Letting E denote the expected value operator, the autocovariance sequence for the xt series, i , is i D Ext x0t
i
The Yule-Walker equations for the autoregressive model that matches the first p elements of the autocovariance sequence are 2 6 6 6 4
0 10 :: : p0
p p :: :
1 0 :: : 1
p0
2
0
2 3 p3 ˆ1 1 p7 7 6 6 7 2 7 6 ˆ2 7 6 2 7 7 6 :: 7 D 6 :: 7 54 : 5 4 : 5 1
32
p
ˆp
p
Preliminary Autoregressive Models F 1739
and 2 6 6 6 4
p
p
1
2 03 p3 1 ‰1 7 6‰ p 7 6 0 7 27 6 2 7 6 27 7 6 :: 7 D 6 :: 7 54 : 5 4 : 5
p0 p0 :: :
10 0 :: :
0 1 :: :
2
1
32
0
p
‰p
p0
p
Here ˆi are the coefficient matrices for the past observation form of the vector autoregressive model, p and ‰i are the coefficient matrices for the future observation form. More information about the Yule-Walker equations in the multivariate setting can be found in Whittle (1963) and Ansley and Newbold (1979). The innovation variance matrices for the two forms can be written as follows: †p D 0
p X
p
ˆi i0
i D1
p D 0
p X
p
‰ i i
i D1
The autoregressive models are fit to the data by using the preceding Yule-Walker equations with i replaced by the sample covariance sequence Ci . The covariance matrices are calculated as Ci D
N X
1 N
1
xt x0t
i
t DiC1
bp , ‰ bp , † b p , and b p represent the Yule-Walker estimates of ˆp , ‰p , †p , and p , respectively. Let ˆ These matrices are written to an output data set when the OUTAR= option is specified. b p and the correspondWhen the PRINTOUT=LONG option is specified, the sequence of matrices † b p is used to compute Akaike ing correlation matrices are printed. The sequence of matrices † information criteria for selection of the autoregressive order of the process.
Akaike Information Criterion The Akaike information criterion (AIC) is defined as –2(maximum of log likelihood )+2(number of parameters). Since the vector autoregressive models are estimates from the Yule-Walker equations, not by maximum likelihood, the exact likelihood values are not available for computing the AIC. However, for the vector autoregressive model the maximum of the log likelihood can be approximated as n b p j/ ln.L/ ln.j† 2 Thus, the AIC for the order p model is computed as b p j/ C 2pr 2 AICp D nln.j†
1740 F Chapter 26: The STATESPACE Procedure
You can use the printed AIC array to compute a likelihood ratio test of the autoregressive order. The log-likelihood ratio test statistic for testing the order p model against the order p 1 model is b p j/ C nln.j† bp nln.j†
1 j/
This quantity is asymptotically distributed as a 2 with r2 degrees of freedom if the series is autoregressive of order p 1. It can be computed from the AIC array as AICp
1
AICp C 2r 2
You can evaluate the significance of these test statistics with the PROBCHI function in a SAS DATA step or with a 2 table.
Determining the Autoregressive Order Although the autoregressive models can be used for prediction, their primary value is to aid in the selection of a suitable portion of the sample covariance matrix for use in computing canonical correlations. If the multivariate time series xt is of autoregressive order p, then the vector of past values to lag p is considered to contain essentially all the information relevant for prediction of future values of the time series. By default, PROC STATESPACE selects the order p that produces the autoregressive model with the smallest AICp . If the value p for the minimum AICp is less than the value of the PASTMIN= option, then p is set to the PASTMIN= value. Alternatively, you can use the ARMAX= and PASTMIN= options to force PROC STATESPACE to use an order you select.
Significance Limits for Partial Autocorrelations The STATESPACE procedure prints a schematic representation of the partial autocorrelation matrices that indicates which partial autocorrelations are significantly greater than or significantly less than 0. Figure 26.11 shows an example of this table. Figure 26.11 Significant Partial Autocorrelations Schematic Representation of Partial Autocorrelations Name/Lag x y
1
2
3
4
5
6
7
8
9
10
++ ++
+. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
.. ..
+ is > 2*std error,
- is < -2*std error,
. is between
bp The partial autocorrelations are from the sample partial autoregressive matrices ˆ p . The standard errors used for the significance limits of the partial autocorrelations are computed from the sequence of matrices †p and p .
Canonical Correlation Analysis F 1741
Under the assumption that the observed series arises from an autoregressive process of order p 1, 1 1 bp the pth sample partial autoregressive matrix ˆ p has an asymptotic variance matrix n p ˝†p . bp The significance limits for ˆ p used in the schematic plot of the sample partial autoregressive sequence are derived by replacing p and †p with their sample estimators to produce the variance estimate, as follows:
b
bp V ar ˆ p D
1 n
rp
b p 1 ˝† bp
Canonical Correlation Analysis Given the order p, let pt be the vector of current and past values relevant to prediction of xt C1 : pt D .x0t ; x0t
0 0 1 ; ; xt p /
Let ft be the vector of current and future values: ft D .x0t ; x0t C1 ; ; x0t Cp /0 In the canonical correlation analysis, consider submatrices of the sample covariance matrix of pt and ft . This covariance matrix, V, has a block Hankel form: 2
C0 6 C0 6 1 VD6 : 4 ::
C01 C02 :: :
C02 C03 :: :
3 C0p C0pC1 7 7 :: 7 : 5
C0p C0pC1 C0pC2
C02p
State Vector Selection Process j
The canonical correlation analysis forms a sequence of potential state vectors zt . Examine a j sequence ft of subvectors of ft , form the submatrix Vj that consists of the rows and columns of V j that correspond to the components of ft , and compute its canonical correlations. The smallest canonical correlation of Vj is then used in the selection of the components of the state vector. The selection process is described in the following discussion. For more details about this process, see Akaike (1976). In the following discussion, the notation xtCkjt denotes the wide sense conditional expectation (best linear predictor) of xt Ck , given all xs with s less than or equal to t. In the notation xi;tC1 , the first subscript denotes the ith component of xt C1 . j
The initial state vector z1t is set to xt . The sequence ft is initialized by setting 0
f1t D .z1t ; x1;t C1jt /0 D .x0t ; x1;t C1jt /0
1742 F Chapter 26: The STATESPACE Procedure
That is, start by considering whether to add x1;t C1jt to the initial state vector z1t . The procedure forms the submatrix V1 that corresponds to f1t and computes its canonical correlations. Denote the smallest canonical correlation of V1 as mi n . If mi n is significantly greater than 0, x1;tC1jt is added to the state vector. If the smallest canonical correlation of V1 is not significantly greater than 0, then a linear combination of f1t is uncorrelated with the past, pt . Assuming that the determinant of C0 is not 0, (that is, no input series is a constant), you can take the coefficient of x1;tC1jt in this linear combination to be 1. Denote the coefficients of z1t in this linear combination as `. This gives the relationship: x1;tC1jt D `0 xt Therefore, the current state vector already contains all the past information useful for predicting x1;tC1 and any greater leads of x1;t . The variable x1;t C1jt is not added to the state vector, nor are any terms x1;t Ckjt considered as possible components of the state vector. The variable x1 is no longer active for state vector selection. The process described for x1;t C1jt is repeated for the remaining elements of ft . The next candidate for inclusion in the state vector is the next component of ft that corresponds to an active variable. Components of ft that correspond to inactive variables that produced a zero mi n in a previous step are skipped. j
Denote the next candidate as xl;tCkjt . The vector ft is formed from the current state vector and xl;tCkjt as follows: j0
j
ft D .zt ; xl;t Ckjt /0 j
The matrix Vj is formed from ft and its canonical correlations are computed. The smallest canonical correlation of Vj is judged to be either greater than or equal to 0. If it is judged to be greater than j 0, xl;t Ckjt is added to the state vector. If it is judged to be 0, then a linear combination of ft is uncorrelated with the pt , and the variable xl is now inactive. The state vector selection process continues until no active variables remain.
Testing Significance of Canonical Correlations For each step in the canonical correlation sequence, the significance of the smallest canonical correlation mi n is judged by an information criterion from Akaike (1976). This information criterion is nln.1
2 mi n/
.r.p C 1/
q C 1/
j
where q is the dimension of ft at the current step, r is the order of the state vector, p is the order of the vector autoregressive process, and is the value of the SIGCORR= option. The default is SIGCORR=2. If this information criterion is less than or equal to 0, mi n is taken to be 0; otherwise, it is taken to be significantly greater than 0. (Do not confuse this information criterion with the AIC.) Variables in xt Cpjt are not added in the model, even with positive information criterion, because of the singularity of V. You can force the consideration of more candidate state variables by increasing the size of the V matrix by specifying a PASTMIN= option value larger than p.
Canonical Correlation Analysis F 1743
Printing the Canonical Correlations To print the details of the canonical correlation analysis process, specify the CANCORR option in the PROC STATESPACE statement. The CANCORR option prints the candidate state vectors, the canonical correlations, and the information criteria for testing the significance of the smallest canonical correlation. Bartlett’s 2 and its degrees of freedom are also printed when the CANCORR option is specified. The formula used for Bartlett’s 2 is 2 D
.n
with r.p C 1/
:5.r.p C 1/
q C 1//ln.1
2 mi n/
q C 1 degrees of freedom.
Figure 26.12 shows the output of the CANCORR option for the introductory example shown in the “Getting Started: STATESPACE Procedure” on page 1718. proc statespace data=in out=out lead=10 cancorr; var x(1) y(1); id t; run;
Figure 26.12 Canonical Correlations Analysis The STATESPACE Procedure Canonical Correlations Analysis
x(T;T)
y(T;T)
x(T+1;T)
Information Criterion
Chi Square
DF
1
1
0.237045
3.566167
11.4505
4
New variables are added to the state vector if the information criteria are positive. In this example, yt C1jt and xt C2jt are not added to the state space vector because the information criteria for these models are negative. If the information criterion is nearly 0, then you might want to investigate models that arise if the opposite decision is made regarding mi n . This investigation can be accomplished by using a FORM statement to specify part or all of the state vector.
Preliminary Estimates of F When a candidate variable xl;tCkjt yields a zero mi n and is not added to the state vector, a linear j j combination of ft is uncorrelated with the pt . Because of the method used to construct the ft j sequence, the coefficient of xl;t Ckjt in l can be taken as 1. Denote the coefficients of zt in this linear combination as l. This gives the relationship: j
xl;tCkjt D l0 zt
1744 F Chapter 26: The STATESPACE Procedure
The vector l is used as a preliminary estimate of the first r columns of the row of the transition matrix F corresponding to xl;t Ck 1jt .
Parameter Estimation The model is zt C1 D Fzt C Get C1 , where et is a sequence of independent multivariate normal innovations with mean vector 0 and variance †ee . The observed sequence xt composes the first r components of zt , and thus xt D Hzt , where H is the r s matrix ŒIr 0. Let E be the r n matrix of innovations: E D e1 en
If the number of observations n is reasonably large, the log likelihood L can be approximated up to an additive constant as follows: LD
1 t race.†ee1 EE0 / 2
n ln.j†ee j/ 2
The elements of †ee are taken as free parameters and are estimated as follows: S0 D
1 0 EE n
Replacing †ee by S0 in the likelihood equation, the log likelihood, up to an additive constant, is LD
n ln.jS0 j/ 2
Letting B be the backshift operator, the formal relation between xt and et is xt D H.I et D .H.I
BF/ BF/
1
Get
1
1
G/
xt D
1 X
„i xt
i
i D0
Letting Ci be the ith lagged sample covariance of xt and neglecting end effects, the matrix S0 is S0 D
1 X
„i C
0
iCj „j
i;j D0
For the computation of S0 , the infinite sum is truncated at the value of the KLAG= option. The value of the KLAG= option should be large enough that the sequence „i is approximately 0 beyond that point.
Forecasting F 1745
Let be the vector of free parameters in the F and G matrices. The derivative of the log likelihood with respect to the parameter is @L D @
n 1 @S0 trace S0 2 @
The second derivative is n @2 L 1 @S0 1 @S0 D trace S S 0 @@ 0 2 @ 0 0 @
@2 S0 trace S0 1 @@ 0
Near the maximum, the first term is unimportant and the second term can be approximated to give the following second derivative approximation: @2 L Š @@ 0
n trace S0
@E0 @ @ 0
1 @E
The first derivative matrix and this second derivative matrix approximation are computed from the sample covariance matrix C0 and the truncated sequence „i . The approximate likelihood function is maximized by a modified Newton-Raphson algorithm that employs these derivative matrices. The matrix S0 is used as the estimate of the innovation covariance matrix, †ee . The negative of the inverse of the second derivative matrix at the maximum is used as an approximate covariance matrix for the parameter estimates. The standard errors of the parameter estimates printed in the parameter estimates tables are taken from the diagonal of this covariance matrix. The parameter covariance matrix is printed when the COVB option is specified. If the data are nearly nonstationary, a better estimate of †ee and the other parameters can sometimes be obtained by specifying the RESIDEST option. The RESIDEST option estimates the parameters by using conditional least squares instead of maximum likelihood. The residuals are computed using the state space equation and the sample mean values of the variables in the model as start-up values. The estimate of S0 is then computed using the residuals from the ith observation on, where i is the maximum number of times any variable occurs in the state vector. A multivariate Gauss-Marquardt algorithm is used to minimize jS0 j. See Harvey (1981a) for a further description of this method.
Forecasting Given estimates of F, G, and †ee , forecasts of xt are computed from the conditional expectation of zt . In forecasting, the parameters F, G, and †ee are replaced with the estimates or by values specified in the RESTRICT statement. One-step-ahead forecasting is performed for the observation xt , where tn b. Here n is the number of observations and b is the value of the BACK= option. For the
1746 F Chapter 26: The STATESPACE Procedure
observation xt , where t > n b, m-step-ahead forecasting is performed for m D t forecasts are generated recursively with the initial condition z0 D 0.
n C b. The
The m-step-ahead forecast of ztCm is zt Cmjt , where zt Cmjt denotes the conditional expectation of zt Cm given the information available at time t. The m-step-ahead forecast of xt Cm is xt Cmjt D Hzt Cmjt , where the matrix H D ŒIr 0. Let ‰i D Fi G. Note that the last s
r elements of zt consist of the elements of xujt for u > t.
The state vector zt Cm can be represented as zt Cm D Fm zt C
m X1
‰i et Cm
i
i D0
Since et Ci jt D 0 for i > 0, the m-step-ahead forecast zt Cmjt is zt Cmjt D Fm zt D Fzt Cm
1jt
Therefore, the m-step-ahead forecast of xt Cm is xt Cmjt D Hzt Cmjt The m-step-ahead forecast error is zt Cm
zt Cmjt D
m X1
‰i et Cm
i
i D0
The variance of the m-step-ahead forecast error is Vz;m D
m X1
‰i †ee ‰i0
i D0
Letting Vz;0 D 0, the variance of the m-step-ahead forecast error of zt Cm , Vz;m , can be computed recursively as follows: Vz;m D Vz;m
1
C ‰m
0
1 †ee ‰m 1
The variance of the m-step-ahead forecast error of xt Cm is the r r left upper submatrix of Vz;m ; that is, Vx;m D HVz;m H0 Unless the NOCENTER option is specified, the sample mean vector is added to the forecast. When differencing is specified, the forecasts x t Cmjt plus the sample mean vector are integrated back to produce forecasts for the original series. Let yt be the original series specified by the VAR statement, with some 0 values appended that correspond to the unobserved past observations. Let B be the backshift operator, and let .B/ be the
Relation of ARMA and State Space Forms F 1747
s s matrix polynomial in the backshift operator that corresponds to the differencing specified by the VAR statement. The off-diagonal elements of i are 0. Note that 0 D Is , where Is is the s s identity matrix. Then zt D .B/yt . This gives the relationship yt D
1
1 X
.B/zt D
ƒi zt
i
i D0
where
1 .B/
D
P1
i D0 ƒi B
i
and ƒ0 D Is .
The m-step-ahead forecast of ytCm is yt Cmjt D
m X1
ƒi zt Cm
i jt
C
i D0
1 X
ƒi zt Cm
i
i Dm
The m-step-ahead forecast error of yt Cm is m X1
ƒi zt Cm
i
ztCm
i jt
D
i D0
m X1
i X
i D0
uD0
! ƒ u ‰i
et Cm
u
i
Letting Vy;0 D 0, the variance of the m-step-ahead forecast error of yt Cm , Vy;m , is
Vy;m D
m X1
i X
i D0
uD0
D Vy;m
1
C
! ƒu ‰i
u
†ee
i X
!0 ƒu ‰i
u
uD0 m X1
! ƒ u ‰m
1 u
†ee
uD0
m X1
!0 ƒu ‰m
1 u
uD0
Relation of ARMA and State Space Forms Every state space model has an ARMA representation, and conversely every ARMA model has a state space representation. This section discusses this equivalence. The following material is adapted from Akaike (1974), where there is a more complete discussion. Pham-Dinh-Tuan (1978) also contains a discussion of this material. Suppose you are given the following ARMA model: ˆ.B/xt D ‚.B/et or, in more detail, xt ˆ1 xt 1 ˆp xt p D et C ‚1 et 1 C C ‚q et q (1) where et is a sequence of independent multivariate normal random vectors with mean 0 and variance matrix †ee , B is the backshift operator (Bxt D xt 1 ), ˆ.B/ and ‚.B/ are matrix polynomials in B, and xt is the observed process.
1748 F Chapter 26: The STATESPACE Procedure
If the roots of the determinantial equation jˆ.B/j D 0 are outside the unit circle in the complex plane, the model can also be written as xt D ˆ
1
.B/‚.B/et D
1 X
‰i et
i
i D0
The ‰i matrices are known as the impulse response matrices and can be computed as ˆ
1 .B/‚.B/.
You can assume p > q since, if this is not initially true, you can add more terms ˆi that are identically 0 without changing the model. To write this set of equations in a state space form, proceed as follows. Let xtCi jt be the conditional expectation of xt Ci given xw for wt . The following relations hold: xt Ci jt D
1 X
‰j et Ci
j
j Di
xt Ci jtC1 D xt Ci jt C ‰i
1 et C1
However, from equation (1) you can derive the following relationship: xt Cpjt D ˆ1 xt Cp 1jt C C ˆp xt
(2)
Hence, when i D p, you can substitute for xt Cpjt in the right-hand side of equation (2) and close the system of equations. This substitution results in the following model in the state space form zt C1 D Fzt C Get C1 : 2 3 2 32 3 2 3 xt C1 0 I 0 0 xt I 6 xt C2jt C1 7 6 0 6 7 6 7 0 I 0 7 6 7 6 7 6 xt C1jt 7 6 ‰1 7 D C 6 7 6 7 6 7 6 :: : :: :: :: :: : 7 et C1 4 5 4 :: 5 4 :: 5 : : : : 54 : xtCpjt C1
ˆp ˆp
1
ˆ1
xt Cp
1jt
‰p
1
Note that the state vector zt is composed of conditional expectations of xt and the first r components of zt are equal to xt . The state space form can be cast into an ARMA form by solving the system of difference equations for the first r components. When converting from an ARMA form to a state space form, you can generate a state vector larger than needed; that is, the state space model might not be a minimal representation. When going from a state space form to an ARMA form, you can have nontrivial common factors in the autoregressive and moving average operators that yield an ARMA model larger than necessary. If the state space form used is not a minimal representation, some but not all components of xt Ci jt might be linearly dependent. This situation corresponds to Œˆp ‚p 1 being of less than full rank when ˆ.B/ and ‚.B/ have no common nontrivial left factors. In this case, zt consists of a subset of the possible components of ŒxtCi jt i D 1; 2; ; p 1: However, once a component of xt Ci jt (for example, the jth one) is linearly dependent on the previous conditional expectations, then all subsequent jth components of xtCkjt for k > i must also be linearly dependent. Note that in this case, equivalent but seemingly different structures can arise if the order of the components within xt is changed.
OUT= Data Set F 1749
OUT= Data Set The forecasts are contained in the output data set specified by the OUT= option in the PROC STATESPACE statement. The OUT= data set contains the following variables: the BY variables the ID variable the VAR statement variables. These variables contain the actual values from the input data set. FORi, numeric variables that contain the forecasts. The variable FORi contains the forecasts for the ith variable in the VAR statement list. Forecasts are one-step-ahead predictions until the end of the data or until the observation specified by the BACK= option. RESi, numeric variables that contain the residual for the forecast of the ith variable in the VAR statement list. For forecast observations, the actual values are missing and the RESi variables contain missing values. STDi, numeric variables that contain the standard deviation for the forecast of the i th variable in the VAR statement list. The values of the STDi variables can be used to construct univariate confidence limits for the corresponding forecasts. However, such confidence limits do not take into account the covariance of the forecasts.
OUTAR= Data Set The OUTAR= data set contains the estimates of the preliminary autoregressive models. The OUTAR= data set contains the following variables: ORDER, a numeric variable that contains the order p of the autoregressive model that the observation represents AIC, a numeric variable that contains the value of the information criterion AICp SIGFl, numeric variables that contain the estimate of the innovation covariance matrices for b p in the the forward autoregressive models. The variable SIGFl contains the lth column of † observations with ORDER=p. SIGBl, numeric variables that contain the estimate of the innovation covariance matrices for b p in the the backward autoregressive models. The variable SIGBl contains the lth column of observations with ORDER=p. FORk _l, numeric variables that contain the estimates of the autoregressive parameter matrices for the forward models. The variable FORk _l contains the lth column of the lag k b p in the observations with ORDER=p. autoregressive parameter matrix ˆ k
1750 F Chapter 26: The STATESPACE Procedure
BACk _l, numeric variables that contain the estimates of the autoregressive parameter matrices for the backward models. The variable BACk _l contains the lth column of the lag k b p in the observations with ORDER=p. autoregressive parameter matrix ‰ k The estimates for the order p autoregressive model can be selected as those observations with p ORDER=p. Within these observations, the k,lth element of ˆi is given by the value of the FORi _l p variable in the kth observation. The k,lth element of ‰i is given by the value of BACi _l variable in the kth observation. The k,lth element of † p is given by SIGFl in the kth observation. The k,lth element of p is given by SIGBl in the kth observation. Table 26.2 shows an example of the OUTAR= data set, with ARMAX=3 and xt of dimension 2. In Table 26.2, .i; j / indicate the i,jth element of the matrix. Table 26.2
Values in the OUTAR= Data Set
Obs
ORDER
AIC
SIGF1
SIGF2
SIGB1
SIGB2
FOR1_1
FOR1_2
FOR2_1
FOR2_2
FOR3_1
1 2 3 4 5 6 7 8
0 0 1 1 2 2 3 3
AIC0 AIC0 AIC1 AIC1 AIC2 AIC2 AIC3 AIC3
† 0.1;1/ † 0.2;1/ † 1.1;1/ † 1.2;1/ † 2.1;1/ † 2.2;1/ † 3.1;1/ † 3.2;1/
† 0.1;2/ † 0.2;2/ † 1.1;2/ † 1.1;2/ † 2.1;2/ † 2.1;2/ † 3.1;2/ † 3.1;2/
0.1;1/ 0.2;1/ 1.1;1/ 1.2;1/ 2.1;1/ 2.2;1/ 3.1;1/ 3.2;1/
0.1;2/ 0.2;2/ 1.1;2/ 1.1;2/ 2.1;2/ 2.1;2/ 3.1;2/ 3.1;2/
. .
. .
ˆ11 .1;1/ ˆ11 .2;1/ ˆ12 .1;1/ ˆ12 .2;1/ ˆ13 .1;1/ ˆ13 .2;1/
ˆ11 .1;2/ ˆ11 .2;2/ ˆ12 .1;2/ ˆ12 .2;2/ ˆ13 .1;2/ ˆ13 .2;2/
. . . .
. . . .
ˆ22 .1;1/ ˆ22 .2;1/ ˆ23 .1;1/ ˆ23 .2;1/
ˆ22 .1;2/ ˆ22 .2;2/ ˆ23 .1;2/ ˆ23 .2;2/
. . . . . .
Obs
FOR3_2
1 2 3 4 5 6 7 8
. . . . . . ˆ33 .1;2/ ˆ33 .2;2/
BACK1_1
ˆ33 .1;1/ ˆ33 .2;1/
BACK1_2
BACK2_1
BACK2_2
BACK3_1
BACK3_2
. .
. .
‰11 .1;1/ ‰11 .2;1/ ‰12 .1;1/ ‰12 .2;1/ ‰13 .1;1/ ‰13 .2;1/
‰11 .1;2/ ‰11 .2;2/ ‰12 .1;2/ ‰12 .2;2/ ‰13 .1;2/ ‰13 .2;2/
. . . .
. . . .
‰22 .1;1/ ‰22 .2;1/ ‰23 .1;1/ ‰23 .2;1/
‰22 .1;2/ ‰22 .2;2/ ‰23 .1;2/ ‰23 .2;2/
. . . . . .
. . . . . .
‰33 .1;1/ ‰33 .2;1/
‰33 .1;2/ ‰33 .2;2/
The estimated autoregressive parameters can be used in the IML procedure to obtain autoregressive estimates of the spectral density function or forecasts based on the autoregressive models.
OUTMODEL= Data Set The OUTMODEL= data set contains the estimates of the F and G matrices and their standard errors, the names of the components of the state vector, and the estimates of the innovation covariance matrix. The variables contained in the OUTMODEL= data set are as follows: the BY variables STATEVEC, a character variable that contains the name of the component of the state vector corresponding to the observation. The STATEVEC variable has the value STD for standard deviations observations, which contain the standard errors for the estimates given in the preceding observation.
Printed Output F 1751
F_j, numeric variables that contain the columns of the F matrix. The variable F_j contains the jth column of F. The number of F_j variables is equal to the value of the DIMMAX= option. If the model is of smaller dimension, the extraneous variables are set to missing. G_j, numeric variables that contain the columns of the G matrix. The variable G_j contains the jth column of G. The number of G_j variables is equal to r, the dimension of xt given by the number of variables in the VAR statement. SIG_j, numeric variables that contain the columns of the innovation covariance matrix. The variable SIG_j contains the jth column of †ee . There are r variables SIG_j. Table 26.3 shows an example of the OUTMODEL= data set, with xt D .xt ; yt /0 , zt D .xt ; yt ; xt C1jt /0 , and DIMMAX=4. In Table 26.3, Fi;j and Gi;j are the i,jth elements of F and G respectively. Note that all elements for F_4 are missing because F is a 3 3 matrix. Table 26.3
Value in the OUTMODEL= Data Set
Obs
STATEVEC
F_1
F_2
F_3
F_4
G_1
G_2
SIG_1
SIG_2
1 2 3 4 5 6
X(T;T) STD Y(T;T) STD X(T+1;T) STD
0 .
0 .
1 .
F 2;1 std F 2;1 F 3;1 std F 3;1
F 2;2 std F 2;2 F 3;2 std F 3;2
F 2;3 std F 2;3 F 3;3 std F 3;3
. . . . . .
1 . 0 .
0 . 1 .
G 3;1 std G 3;1
G 3;2 std G 3;2
† 1;1 . † 2;1 . . .
† 1;2 . † 2;2 . . .
Printed Output The printed output produced by the STATESPACE procedure includes the following: 1. descriptive statistics, which include the number of observations used, the names of the variables, their means and standard deviations (Std), and the differencing operations used 2. the Akaike information criteria for the sequence of preliminary autoregressive models 3. if the PRINTOUT=LONG option is specified, the sample autocovariance matrices of the input series at various lags 4. if the PRINTOUT=LONG option is specified, the sample autocorrelation matrices of the input series 5. a schematic representation of the autocorrelation matrices, showing the significant autocorrelations 6. if the PRINTOUT=LONG option is specified, the partial autoregressive matrices. (These are p ˆp as described in the section “Preliminary Autoregressive Models” on page 1738.)
1752 F Chapter 26: The STATESPACE Procedure
7. a schematic representation of the partial autocorrelation matrices, showing the significant partial autocorrelations 8. the Yule-Walker estimates of the autoregressive parameters for the autoregressive model with the minimum AIC 9. if the PRINTOUT=LONG option is specified, the autocovariance matrices of the residuals of the minimum AIC model. This is the sequence of estimated innovation variance matrices for the solutions of the Yule-Walker equations. 10. if the PRINTOUT=LONG option is specified, the autocorrelation matrices of the residuals of the minimum AIC model 11. If the CANCORR option is specified, the canonical correlations analysis for each potential state vector considered in the state vector selection process. This includes the potential state vector, the canonical correlations, the information criterion for the smallest canonical correlation, Bartlett’s 2 statistic (“Chi Square”) for the smallest canonical correlation, and the degrees of freedom of Bartlett’s 2 . 12. the components of the chosen state vector 13. the preliminary estimate of the transition matrix, F, the input matrix, G, and the variance matrix for the innovations, †ee 14. if the ITPRINT option is specified, the iteration history of the likelihood maximization. For each iteration, this shows the iteration number, the number of step halvings, the determinant of the innovation variance matrix, the damping factor Lambda, and the values of the parameters. 15. the state vector, printed again to aid interpretation of the following listing of F and G 16. the final estimate of the transition matrix F 17. the final estimate of the input matrix G 18. the final estimate of the variance matrix for the innovations †ee 19. a table that lists the estimates of the free parameters in F and G and their standard errors and t statistics 20. if the COVB option is specified, the covariance matrix of the parameter estimates 21. if the COVB option is specified, the correlation matrix of the parameter estimates 22. if the PRINT option is specified, the forecasts and their standard errors
ODS Table Names PROC STATESPACE assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table.
Examples: STATESPACE Procedure F 1753
Table 26.4
ODS Tables Produced in PROC STATESPACE
ODS Table Name
Description
Option
NObs Summary InfoCriterion CovLags CorrLags PartialAR YWEstimates CovResiduals CorrResiduals StateVector CorrGraph TransitionMatrix InputMatrix VarInnov CovB CorrB CanCorr IterHistory ParameterEstimates Forecasts ConvergenceStatus
number of observations simple summary statistics table information criterion table covariance matrices of input series correlation matrices of input series partial autoregressive matrices Yule-Walker estimates for minimum AIC covariance of residuals residual correlations from AR models state vector table schematic representation of correlations transition matrix input matrix variance matrix for the innovation covariance of parameter estimates correlation of parameter estimates canonical correlation analysis iterative fitting table parameter estimates table forecasts table convergence status table
default default default PRINTOUT=LONG PRINTOUT=LONG PRINTOUT=LONG default PRINTOUT=LONG PRINTOUT=LONG default default default default default COVB COVB CANCORR ITPRINT default PRINT default
Examples: STATESPACE Procedure
Example 26.1: Series J from Box and Jenkins This example analyzes the gas furnace data (series J) from Box and Jenkins. (The data are not shown; see Box and Jenkins (1976) for the data.) First, a model is selected and fit automatically using the following statements. title1 'Gas Furnace Data'; title2 'Box & Jenkins Series J'; title3 'Automatically Selected Model'; proc statespace data=seriesj cancorr; var x y; run;
1754 F Chapter 26: The STATESPACE Procedure
The results for the automatically selected model are shown in Output 26.1.1. Output 26.1.1 Results for Automatically Selected Model Gas Furnace Data Box & Jenkins Series J Automatically Selected Model The STATESPACE Procedure Number of Observations
Mean
Standard Error
-0.05683 53.50912
1.072766 3.202121
Variable x y
296
Gas Furnace Data Box & Jenkins Series J Automatically Selected Model The STATESPACE Procedure Information Criterion for Autoregressive Models Lag=0
Lag=1
Lag=2
Lag=3
Lag=4
Lag=5
Lag=6
Lag=7
Lag=8
651.3862 -1033.57 -1632.96 -1645.12 -1651.52 -1648.91 -1649.34 -1643.15 -1638.56 Information Criterion for Autoregressive Models Lag=9
Lag=10
-1634.8
-1633.59
Schematic Representation of Correlations Name/Lag x y
0
1
2
3
4
5
6
7
8
9
10
+-+
+-+
+-+
+-+
+-+
+-+
+-+
+-+
+-+
+-+
+-+
+ is > 2*std error,
- is < -2*std error,
. is between
Example 26.1: Series J from Box and Jenkins F 1755
Output 26.1.2 Results for Automatically Selected Model Schematic Representation of Partial Autocorrelations Name/Lag x y
1
2
3
4
5
6
7
8
9
10
+. -+
-. --
+. -.
.. .+
.. ..
-. ..
.. ..
.. ..
.. ..
.. .+
+ is > 2*std error,
- is < -2*std error,
. is between
Yule-Walker Estimates for Minimum AIC ------Lag=1------ ------Lag=2------ ------Lag=3------ ------Lag=4-----x y x y x y x y x y
1.925887 -0.00124 -1.20166 0.004224 0.116918 -0.00867 0.104236 0.003268 0.050496 1.299793 -0.02046 -0.3277 -0.71182 -0.25701 0.195411 0.133417
Output 26.1.3 Results for Automatically Selected Model Gas Furnace Data Box & Jenkins Series J Automatically Selected Model The STATESPACE Procedure Canonical Correlations Analysis
x(T;T)
y(T;T)
x(T+1;T)
Information Criterion
Chi Square
DF
1
1
0.804883
292.9228
304.7481
8
Output 26.1.4 Results for Automatically Selected Model Gas Furnace Data Box & Jenkins Series J Automatically Selected Model The STATESPACE Procedure Selected Statespace Form and Preliminary Estimates State Vector x(T;T)
y(T;T)
x(T+1;T)
y(T+1;T)
y(T+2;T)
Estimate of Transition Matrix 0 0 -0.84718 0 -0.19785
0 0 0.026794 0 0.334274
1 0 1.711715 0 -0.18174
0 1 -0.05019 0 -1.23557
0 0 0 1 1.787475
1756 F Chapter 26: The STATESPACE Procedure
Output 26.1.4 continued Input Matrix for Innovation 1 0 1.925887 0.050496 0.142421
0 1 -0.00124 1.299793 1.361696
Output 26.1.5 Results for Automatically Selected Model Variance Matrix for Innovation 0.035274 -0.00734
-0.00734 0.097569
Output 26.1.6 Results for Automatically Selected Model Gas Furnace Data Box & Jenkins Series J Automatically Selected Model The STATESPACE Procedure Selected Statespace Form and Fitted Model State Vector x(T;T)
y(T;T)
x(T+1;T)
y(T+1;T)
y(T+2;T)
Estimate of Transition Matrix 0 0 -0.86192 0 -0.34839
0 0 0.030609 0 0.292124
1 0 1.724235 0 -0.09435
0 1 -0.05483 0 -1.09823
Input Matrix for Innovation 1 0 1.92442 0.015621 0.08058
0 1 -0.00416 1.258495 1.353204
0 0 0 1 1.671418
Example 26.1: Series J from Box and Jenkins F 1757
Output 26.1.7 Results for Automatically Selected Model Variance Matrix for Innovation 0.035579 -0.00728
-0.00728 0.095577
Parameter Estimates
Parameter
Estimate
Standard Error
t Value
F(3,1) F(3,2) F(3,3) F(3,4) F(5,1) F(5,2) F(5,3) F(5,4) F(5,5) G(3,1) G(3,2) G(4,1) G(4,2) G(5,1) G(5,2)
-0.86192 0.030609 1.724235 -0.05483 -0.34839 0.292124 -0.09435 -1.09823 1.671418 1.924420 -0.00416 0.015621 1.258495 0.080580 1.353204
0.072961 0.026167 0.061599 0.030169 0.135253 0.046299 0.096527 0.109525 0.083737 0.058162 0.035255 0.095771 0.055742 0.151622 0.091388
-11.81 1.17 27.99 -1.82 -2.58 6.31 -0.98 -10.03 19.96 33.09 -0.12 0.16 22.58 0.53 14.81
The two series are believed to have a transfer function relation with the gas rate (variable X) as the input and the CO2 concentration (variable Y) as the output. Since the parameter estimates shown in Output 26.1.1 support this kind of model, the model is reestimated with the feedback parameters restricted to 0. The following statements fit the transfer function (no feedback) model. title3 'Transfer Function Model'; proc statespace data=seriesj printout=none; var x y; restrict f(3,2)=0 f(3,4)=0 g(3,2)=0 g(4,1)=0 g(5,1)=0; run;
The last two pages of the output are shown in Output 26.1.8. Output 26.1.8 STATESPACE Output for Transfer Function Model Gas Furnace Data Box & Jenkins Series J Transfer Function Model The STATESPACE Procedure Selected Statespace Form and Fitted Model State Vector x(T;T)
y(T;T)
x(T+1;T)
y(T+1;T)
y(T+2;T)
1758 F Chapter 26: The STATESPACE Procedure
Output 26.1.8 continued Estimate of Transition Matrix 0 0 -0.68882 0 -0.35944
0 0 0 0 0.284179
1 0 1.598717 0 -0.0963
0 1 0 0 -1.07313
0 0 0 1 1.650047
Input Matrix for Innovation 1 0 1.923446 0 0
0 1 0 1.260856 1.346332
Output 26.1.9 STATESPACE Output for Transfer Function Model Variance Matrix for Innovation 0.036995 -0.0072
-0.0072 0.095712
Parameter Estimates
Parameter
Estimate
Standard Error
t Value
F(3,1) F(3,3) F(5,1) F(5,2) F(5,3) F(5,4) F(5,5) G(3,1) G(4,2) G(5,2)
-0.68882 1.598717 -0.35944 0.284179 -0.09630 -1.07313 1.650047 1.923446 1.260856 1.346332
0.050549 0.050924 0.229044 0.096944 0.140876 0.250385 0.188533 0.056328 0.056464 0.091086
-13.63 31.39 -1.57 2.93 -0.68 -4.29 8.75 34.15 22.33 14.78
References Akaike, H. (1974), “Markovian Representation of Stochastic Processes and Its Application to the Analysis of Autoregressive Moving Average Processes,” Annals of the Institute of Statistical Mathematics, 26, 363–387.
References F 1759
Akaike, H. (1976), “Canonical Correlations Analysis of Time Series and the Use of an Information Criterion,” in Advances and Case Studies in System Identification, eds. R. Mehra and D.G. Lainiotis, New York: Academic Press. Anderson, T.W. (1971), The Statistical Analysis of Time Series, New York: John Wiley & Sons. Ansley, C.F. and Newbold, P. (1979), “Multivariate Partial Autocorrelations,” Proceedings of the Business and Economic Statistics Section, American Statistical Association, 349–353. Box, G.E.P. and Jenkins, G. (1976), Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day. Brockwell, P.J. and Davis, R.A. (1991), Time Series: Theory and Methods, 2nd Edition, SpringerVerlag. Hannan, E.J. (1970), Multiple Time Series, New York: John Wiley & Sons. Hannan, E.J. (1976), “The Identification and Parameterization of ARMAX and State Space Forms,” Econometrica, 44, 713–722. Harvey, A.C. (1981a), The Econometric Analysis of Time Series, New York: John Wiley & Sons. Harvey, A.C. (1981b), Time Series Models, New York: John Wiley & Sons. Jones, R.H. (1974), “Identification and Autoregressive Spectrum Estimation,” IEEE Transactions on Automatic Control, AC-19, 894–897. Pham-Dinh-Tuan (1978), “On the Fitting of Multivariate Processes of the Autoregressive Moving Average Type,” Biometrika, 65, 99–107. Priestley, M.B. (1980), “System Identification, Kalman Filtering, and Stochastic Control,” in Directions in Time Series, eds. D.R. Brillinger and G.C. Tiao, Institute of Mathematical Statistics. Whittle, P. (1963), “On the Fitting of Multivariate Autoregressions and the Approximate Canonical Factorization of a Spectral Density Matrix,” Biometrika, 50, 129–134.
1760
Chapter 27
The SYSLIN Procedure Contents Overview: SYSLIN Procedure . . . . . . . . . . . . . . Getting Started: SYSLIN Procedure . . . . . . . . . . . An Example Model . . . . . . . . . . . . . . . . Variables in a System of Equations . . . . . . . . Using PROC SYSLIN . . . . . . . . . . . . . . . OLS Estimation . . . . . . . . . . . . . . . . . . Two-Stage Least Squares Estimation . . . . . . . LIML, K-Class, and MELO Estimation . . . . . . SUR, 3SLS, and FIML Estimation . . . . . . . . . Computing Reduced Form Estimates . . . . . . . Restricting Parameter Estimates . . . . . . . . . . Testing Parameters . . . . . . . . . . . . . . . . . Saving Residuals and Predicted Values . . . . . . Plotting Residuals . . . . . . . . . . . . . . . . . Syntax: SYSLIN Procedure . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . PROC SYSLIN Statement . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . ENDOGENOUS Statement . . . . . . . . . . . . IDENTITY Statement . . . . . . . . . . . . . . . INSTRUMENTS Statement . . . . . . . . . . . . MODEL Statement . . . . . . . . . . . . . . . . . OUTPUT Statement . . . . . . . . . . . . . . . . RESTRICT Statement . . . . . . . . . . . . . . . SRESTRICT Statement . . . . . . . . . . . . . . STEST Statement . . . . . . . . . . . . . . . . . TEST Statement . . . . . . . . . . . . . . . . . . VAR Statement . . . . . . . . . . . . . . . . . . . WEIGHT Statement . . . . . . . . . . . . . . . . Details: SYSLIN Procedure . . . . . . . . . . . . . . . Input Data Set . . . . . . . . . . . . . . . . . . . Estimation Methods . . . . . . . . . . . . . . . . ANOVA Table for Instrumental Variables Methods The R-Square Statistics . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 1762 . 1763 . 1763 . 1764 . 1765 . 1765 . . 1767 . 1769 . 1769 . 1773 . 1774 . 1776 . 1778 . 1779 . 1780 . . 1781 . 1782 . 1785 . 1785 . 1786 . 1786 . 1786 . 1788 . 1789 . 1790 . . 1791 . 1792 . 1794 . 1794 . 1795 . 1795 . 1795 . 1798 . 1799
1762 F Chapter 27: The SYSLIN Procedure
Computational Details . . . . . . . . . . . . . . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . OUT= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . OUTEST= Data Set . . . . . . . . . . . . . . . . . . . . . . . . OUTSSCP= Data Set . . . . . . . . . . . . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples: SYSLIN Procedure . . . . . . . . . . . . . . . . . . . . . . Example 27.1: Klein’s Model I Estimated with LIML and 3SLS . Example 27.2: Grunfeld’s Model Estimated with SUR . . . . . . Example 27.3: Illustration of ODS Graphics . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. 1799 . 1802 . 1803 . 1803 . 1804 . 1805 . . 1807 . 1808 . 1808 . 1808 . 1816 . 1819 . 1823
Overview: SYSLIN Procedure The SYSLIN procedure estimates parameters in an interdependent system of linear regression equations. Ordinary least squares (OLS) estimates are biased and inconsistent when current period endogenous variables appear as regressors in other equations in the system. The errors of a set of related regression equations are often correlated, and the efficiency of the estimates can be improved by taking these correlations into account. The SYSLIN procedure provides several techniques that produce consistent and asymptotically efficient estimates for systems of regression equations. The SYSLIN procedure provides the following estimation methods: ordinary least squares (OLS) two-stage least squares (2SLS) limited information maximum likelihood (LIML) K-class seemingly unrelated regressions (SUR) iterated seemingly unrelated regressions (ITSUR) three-stage least squares (3SLS) iterated three-stage least squares (IT3SLS) full information maximum likelihood (FIML) minimum expected loss (MELO)
Getting Started: SYSLIN Procedure F 1763
Other features of the SYSLIN procedure enable you to: impose linear restrictions on the parameter estimates test linear hypotheses about the parameters write predicted and residual values to an output SAS data set write parameter estimates to an output SAS data set write the crossproducts matrix (SSCP) to an output SAS data set use raw data, correlations, covariances, or cross products as input
Getting Started: SYSLIN Procedure This section introduces the use of the SYSLIN procedure. The problem of dependent regressors is introduced using a supply and demand example. This section explains the terminology used for variables in a system of regression equations and introduces the SYSLIN procedure statements for declaring the roles the variables play. The syntax used for the different estimation methods and the output produced is shown.
An Example Model In simultaneous systems of equations, endogenous variables are determined jointly rather than sequentially. Consider the following supply and demand functions for some product: QD D a1 C b1 P C c1 Y C d1 S C 1 .demand/ QS D a2 C b2 P C c2 U C 2 .supply/ Q D QD D QS .market equilibrium/ The variables in this system are as follows: QD
quantity demanded
QS
quantity supplied
Q
the observed quantity sold, which equates quantity supplied and quantity demanded in equilibrium
P
price per unit
Y
income
S
price of substitutes
1764 F Chapter 27: The SYSLIN Procedure
U
unit cost
1
the random error term for the demand equation
2
the random error term for the supply equation
In this system, quantity demanded depends on price, income, and the price of substitutes. Consumers normally purchase more of a product when prices are lower and when income and the price of substitute goods are higher. Quantity supplied depends on price and the unit cost of production. Producers supply more when price is high and when unit cost is low. The actual price and quantity sold are determined jointly by the values that equate demand and supply. Since price and quantity are jointly endogenous variables, both structural equations are necessary to adequately describe the observed values. A critical assumption of OLS is that the regressors are uncorrelated with the residual. When current endogenous variables appear as regressors in other equations (endogenous variables depend on each other), this assumption is violated and the OLS parameter estimates are biased and inconsistent. The bias caused by the violated assumptions is called simultaneous equation bias. Neither the demand nor supply equation can be estimated consistently by OLS.
Variables in a System of Equations Before explaining how to use the SYSLIN procedure, it is useful to define some terms. The variables in a system of equations can be classified as follows: Endogenous variables, which are also called jointly dependent or response variables, are the variables determined by the system. Endogenous variables can also appear on the right-hand side of equations. Exogenous variables are independent variables that do not depend on any of the endogenous variables in the system. Predetermined variables include both the exogenous variables and lagged endogenous variables, which are past values of endogenous variables determined at previous time periods. PROC SYSLIN does not compute lagged values; any lagged endogenous variables must be computed in a preceding DATA step. Instrumental variables are predetermined variables used in obtaining predicted values for the current period endogenous variables by a first-stage regression. The use of instrumental variables characterizes estimation methods such as two-stage least squares and three-stage least squares. Instrumental variables estimation methods substitute these first-stage predicted values for endogenous variables when they appear as regressors in model equations.
Using PROC SYSLIN F 1765
Using PROC SYSLIN First specify the input data set and estimation method in the PROC SYSLIN statement. If any model uses dependent regressors, and you are using an instrumental variables regression method, declare the dependent regressors with an ENDOGENOUS statement and declare the instruments with an INSTRUMENTS statement. Next, use MODEL statements to specify the structural equations of the system. The use of different estimation methods is shown by the following examples. These examples use the simulated dataset WORK.IN given below. data in; label q = "Quantity" p = "Price" s = "Price of Substitutes" y = "Income" u = "Unit Cost"; drop i e1 e2; p = 0; q = 0; do i = 1 to 60; y = 1 + .05*i + .15*rannor(123); u = 2 + .05*rannor(123) + .05*rannor(123); s = 4 - .001*(i-10)*(i-110) + .5*rannor(123); e1 = .15 * rannor(123); e2 = .15 * rannor(123); demandx = 1 + .3 * y + .35 * s + e1; supplyx = -1 - 1 * u + e2 - .4*e1; q = 1.4/2.15 * demandx + .75/2.15 * supplyx; p = ( - q + supplyx ) / -1.4; output; end; run;
OLS Estimation PROC SYSLIN performs OLS regression if you do not specify a method of estimation in the PROC SYSLIN statement. OLS does not use instruments, so the ENDOGENOUS and INSTRUMENTS statements can be omitted. The following statements estimate the supply and demand model shown previously: proc syslin data=in; demand: model q = p y s; supply: model q = p u; run;
The PROC SYSLIN output for the demand equation is shown in Figure 27.1, and the output for the supply equation is shown in Figure 27.2.
1766 F Chapter 27: The SYSLIN Procedure
Figure 27.1 OLS Results for Demand Equation The SYSLIN Procedure Ordinary Least Squares Estimation Model Dependent Variable Label
DEMAND q Quantity
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
Model Error Corrected Total
3 56 59
9.587901 0.449338 10.03724
3.195967 0.008024
Root MSE Dependent Mean Coeff Var
0.08958 1.30095 6.88542
R-Square Adj R-Sq
F Value
Pr > F
398.31
|t|
1 1 1 1
-0.47677 0.123326 0.201282 0.167258
0.210239 0.105177 0.032403 0.024091
-2.27 1.17 6.21 6.94
0.0272 0.2459 F
20.34
|t|
Intercept ge_f
1 1
-9.95631 0.026551
31.37425 0.015566
-0.32 1.71
0.7548 0.1063
ge_c
1
0.151694
0.025704
5.90
F
24.76
|t|
Intercept wh_f
1 1
-0.50939 0.052894
8.015289 0.015707
-0.06 3.37
0.9501 0.0037
wh_c
1
0.092406
0.056099
1.65
0.1179
Variable
Variable Label Intercept Value of Outstanding Shares Lagged, WH Capital Stock Lagged, WH
Output 27.2.3 PROC SYSLIN Output for SUR The SYSLIN Procedure Seemingly Unrelated Regression Estimation Cross Model Covariance
GE WESTING
GE
WESTING
777.446 207.587
207.587 104.308
Cross Model Correlation
GE WESTING
GE
WESTING
1.00000 0.72896
0.72896 1.00000
Cross Model Inverse Correlation
GE WESTING
GE
WESTING
2.13397 -1.55559
-1.55559 2.13397
Cross Model Inverse Covariance
GE WESTING
GE
WESTING
0.002745 -.005463
-.005463 0.020458
Output 27.2.4 PROC SYSLIN Output for SUR System Weighted MSE Degrees of freedom System Weighted R-Square
0.9719 34 0.6284
Example 27.3: Illustration of ODS Graphics F 1819
Output 27.2.4 continued Model Dependent Variable Label
GE ge_i Gross Investment, GE
Parameter Estimates
DF
Parameter Estimate
Standard Error
t Value
Pr > |t|
Intercept ge_f
1 1
-27.7193 0.038310
29.32122 0.014415
-0.95 2.66
0.3577 0.0166
ge_c
1
0.139036
0.024986
5.56
|t|
Intercept wh_f
1 1
-1.25199 0.057630
7.545217 0.014546
-0.17 3.96
0.8702 0.0010
wh_c
1
0.063978
0.053041
1.21
0.2443
Variable
Variable Label Intercept Value of Outstanding Shares Lagged, WH Capital Stock Lagged, WH
Example 27.3: Illustration of ODS Graphics This example illustrates the use of ODS graphics. This is a continuation of the section “Example 27.1: Klein’s Model I Estimated with LIML and 3SLS” on page 1808. These graphical displays are requested by specifying the ODS GRAPHICS statement before running PROC SYSLIN. For information about the graphics available in the SYSLIN procedure, see the section “ODS Graphics” on page 1808. The following statements show how to generate ODS graphics plots with the SYSLIN procedure. The plots of residuals for each one of the equations in the model are displayed in Figure 27.3.1 through Figure 27.3.3.
1820 F Chapter 27: The SYSLIN Procedure
*---------------------------Klein's Model I----------------------------* | By L.R. Klein, Economic Fluctuations in the United States, 1921-1941 | | (1950), NY: John Wiley. A macro-economic model of the U.S. with | | three behavioral equations, and several identities. See Theil, p.456.| *----------------------------------------------------------------------*; data klein; input year c p w i x wp g t k wsum; date=mdy(1,1,year); format date monyy.; y =c+i+g-t; yr =year-1931; klag=lag(k); plag=lag(p); xlag=lag(x); label year='Year' date='Date' c ='Consumption' p ='Profits' w ='Private Wage Bill' i ='Investment' k ='Capital Stock' y ='National Income' x ='Private Production' wsum='Total Wage Bill' wp ='Govt Wage Bill' g ='Govt Demand' i ='Taxes' klag='Capital Stock Lagged' plag='Profits Lagged' xlag='Private Product Lagged' yr ='YEAR-1931'; datalines; 1920 . 12.7 . . 44.9 . . . 182.8 . 1921 41.9 12.4 25.5 -0.2 45.6 2.7 3.9 7.7 182.6 28.2 1922 45.0 16.9 29.3 1.9 50.1 2.9 3.2 3.9 184.5 32.2 1923 49.2 18.4 34.1 5.2 57.2 2.9 2.8 4.7 189.7 37.0 1924 50.6 19.4 33.9 3.0 57.1 3.1 3.5 3.8 192.7 37.0 1925 52.6 20.1 35.4 5.1 61.0 3.2 3.3 5.5 197.8 38.6 1926 55.1 19.6 37.4 5.6 64.0 3.3 3.3 7.0 203.4 40.7 1927 56.2 19.8 37.9 4.2 64.4 3.6 4.0 6.7 207.6 41.5 1928 57.3 21.1 39.2 3.0 64.5 3.7 4.2 4.2 210.6 42.9 1929 57.8 21.7 41.3 5.1 67.0 4.0 4.1 4.0 215.7 45.3 ... more lines ...
ods graphics on; proc syslin data=klein outest=b liml plots(unpack only)=residual ; endogenous c p w i x wsum k y; instruments klag plag xlag wp g t yr; consume: model c = p plag wsum; invest: model i = p plag klag; labor: model w = x xlag yr; run;
Example 27.3: Illustration of ODS Graphics F 1821
Output 27.3.1 Residuals Diagnostic Plots for Consumption
1822 F Chapter 27: The SYSLIN Procedure
Output 27.3.2 Residuals Diagnostic Plots for Investments
References F 1823
Output 27.3.3 Residuals Diagnostic Plots for Labor
References Basmann, R.L. (1960), “On Finite Sample Distributions of Generalized Classical Linear Identifiability Test Statistics,” Journal of the American Statistical Association, 55, 650–659. Fuller, W.A. (1977), “Some Properties of a Modification of the Limited Information Estimator,” Econometrica, 45, 939–952. Hausman, J.A. (1975), “An Instrumental Variable Approach to Full Information Estimators for Linear and Certain Nonlinear Econometric Models,” Econometrica, 43, 727–738. Johnston, J. (1984), Econometric Methods, Third Edition, New York: McGraw-Hill. Judge, George G., W. E. Griffiths, R. Carter Hill, Helmut Lutkepohl, and Tsoung-Chao Lee (1985), The Theory and Practice of Econometrics, Second Edition, New York: John Wiley & Sons. Maddala, G.S. (1977), Econometrics, New York: McGraw-Hill.
1824 F Chapter 27: The SYSLIN Procedure
Park, S.B. (1982), “Some Sampling Properties of Minimum Expected Loss (MELO) Estimators of Structural Coefficients,” Journal of the Econometrics, 18, 295–311. Pindyck, R.S. and Rubinfeld, D.L. (1981), Econometric Models and Economic Forecasts, Second Edition, New York: McGraw-Hill. Pringle, R.M. and Rayner, A.A. (1971), Generalized Inverse Matrices with Applications to Statistics, New York: Hafner Publishing Company. Rao, P. (1974), “Specification Bias in Seemingly Unrelated Regressions,” in Essays in Honor of Tinbergen, Volume 2, New York: International Arts and Sciences Press. Savin, N.E. and White, K.J. (1978), “Testing for Autocorrelation with Missing Observations,” Econometrics, 46, 59–66. Theil, H. (1971), Principles of Econometrics, New York: John Wiley & Sons. Zellner, A. (1962), “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias,” Journal of the American Statistical Association, 57, 348–368. Zellner, A. (1978), “Estimation of Functions of Population Means and Regression Coefficients: A Minimum Expected Loss (MELO) Approach,” Journal of the Econometrics, 8, 127–158. Zellner, A. and Park, S. (1979), “Minimum Expected Loss (MELO) Estimators for Functions of Parameters and Structural Coefficients of Econometric Models,” Journal of the American Statistical Association, 74, 185–193.
Chapter 28
The TIMEID Procedure (Experimental) Contents Overview: TIMEID Procedure . . . . . . . . . . . . . . . . Getting Started: TIMEID Procedure . . . . . . . . . . . . . Syntax: TIMEID Procedure . . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . . PROC TIMEID Statement . . . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . . . . . . . . Details: TIMEID Procedure . . . . . . . . . . . . . . . . . Time ID Diagnostics . . . . . . . . . . . . . . . . . . Diagnostic Output Representation . . . . . . . . . . . Inferring Time Intervals and Alignments . . . . . . . Data Set Output . . . . . . . . . . . . . . . . . . . . Printed Tabular Output . . . . . . . . . . . . . . . . . ODS Graphics . . . . . . . . . . . . . . . . . . . . . Examples: TIMEID Procedure . . . . . . . . . . . . . . . . Example 28.1: Examining a Weekly Time ID Variable Example 28.2: Inferring a Date Interval . . . . . . . . Example 28.3: Examining Multiple BY Groups . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
.
. . .
.
1825 1826 1826 1827 1828 1829 1829 1831 1831 1831 1833 1834 1836 1837 1838 1838 1845 1846
Overview: TIMEID Procedure The TIMEID procedure evaluates a variable in an input data set for its suitability as a time ID variable in SAS procedures and solutions that are used for time series analysis. PROC TIMEID assesses how well a time interval specification fits SAS date or datetime values, or observation numbers used to index a time series. The time interval used in this analysis can be either specified explicitly as input to PROC TIMEID or inferred by the procedure based on values of the time ID variable. The TIMEID procedure produces diagnostic information in the form of data sets and ODS tabular and plotted output. These diagnostic results summarize characteristics of the time ID variable that can help determine its use as an index in other time series procedures and solutions. PROC TIMEID is intended for use as a tool to either identify the time interval of a variable or prepare problematic data sets for use in subsequent time series analyses. In particular, this procedure can
1826 F Chapter 28: The TIMEID Procedure (Experimental)
be used to investigate inconsistencies between time ID values and the ID statement options used in other SAS procedures and solutions.
Getting Started: TIMEID Procedure When a data set contains a time ID variable with corrupted, missing, or duplicate values, PROC TIMEID can help isolate and identify these problematic observations. For a data set with a small number of ID variable anomalies and a known time interval, a graphical depiction of the problem areas can be created using the following statements: proc timeid data= plot=values; id interval=; run;
For larger data sets whose quality is unknown, it can be useful to get a general overview of the relative number of observations with problematic time ID values. The following statements graphically summarize the prevalence of anomalous time ID values: proc timeid date= plot=(intervalcounts offsets spans); id interval=; run;
When prior knowledge of the time interval that separates observations is incomplete, PROC TIMEID can be used to infer the interval by omitting the INTERVAL= option from the ID statement as in the following statements: proc timeid date= outinterval=; id ; run;
Syntax: TIMEID Procedure The TIMEID procedure uses the following statements: PROC TIMEID options ; BY variables ; ID variable < options > ;
Functional Summary F 1827
Functional Summary The statements and options that control the TIMEID procedure are summarized in Table 28.1. Table 28.1
Syntax Summary
Description
Statement
Statements Specifies data sets and options Specifies BY-group processing Specifies the time ID variable
PROC TIMEID BY ID
Data Set Options Specifies the input data set Specifies the maximum number of ID values to analyze Specifies the output frequency count data set Specifies the output interval data set Specifies the detailed output interval data set
Option
PROC TIMEID PROC TIMEID
DATA= NBYOBS=
PROC TIMEID
OUTFREQ=
PROC TIMEID PROC TIMEID
OUTINTERVAL= OUTINTERVALDETAILS=
ID ID
ALIGN= DUPLICATES
ID
INTERVAL=
ID
NOTSORTED
Printing and Plotting Options Specifies the time ID format Specifies the types of graphical output Specifies the types of printed output
ID PROC TIMEID PROC TIMEID
FORMAT= PLOT= PRINT=
Miscellaneous Options Limits error and warning messages
PROC TIMEID
MAXERROR=
Time ID Options Specifies the interval alignment Specifies that duplicate time ID values can be present in DATA= data set Specifies the time interval between observations Specifies that time ID variable values are not sorted
1828 F Chapter 28: The TIMEID Procedure (Experimental)
PROC TIMEID Statement PROC TIMEID options ;
The following options can be used in the PROC TIMEID statement: DATA=SAS-data-set
names the SAS data set that contains the input data for the procedure. If the DATA= option is not specified, the most recently created SAS data set is used. MAXERROR=number
limits the number of warning and error messages produced during the execution of the procedure to the specified value. The default is MAXERRORS=50. This option is particularly useful in BY-group processing where it can be used to suppress recurring messages. NBYOBS=number
limits the number of observations that are used to analyze the time ID variable. The NBYOBS= option should be used instead of the OBS= data set option when BY variables are specified. The NBYOBS= option excludes observations from incomplete BY groups in the analysis. This option guarantees that any truncation of the DATA= data set occurs at a BY-group boundary. Only BY groups that are completely contained within the first number of observations are processed. When the NBYOBS= option is omitted, all observations are processed. OUTFREQ=SAS-data-set
names the output data set to contain the frequency counts of each unique value of the time ID variable. The frequency counts are performed on time ID values that are recorded in the DATA= data set. The time ID values are not aligned with respect to an interval prior to computation of the frequency counts. See the section “OUTFREQ= Data Set” on page 1834 for details. OUTINTERVAL=SAS-data-set
names the output data set to contain the time ID interval information that is summarized across all BY groups in the DATA= data set. See the section “OUTINTERVAL= Data Set” on page 1834 for details. OUTINTERVALDETAILS=SAS-data-set
names the output data set to contain the time ID interval information for each BY group. See the section “OUTINTERVALDETAILS= Data Set” on page 1835 for details. PLOT(global-option)=request-option | (request-options)
specifies the graphical output desired. By default, the TIMEID procedure produces no graphical output. The following global-options are available: UNPACK | UNPACKPANELS
suppresses paneling.
By default, multiple plots can appear in some output panels. Specify UNPACKPANELS to get each plot in a separate panel. The following plot request-options are available:
BY Statement F 1829
COUNTS | INTCNTS | INTERVALCOUNTS plots a histogram of the time ID interval counts. OFFSETS
plots a histogram of the time offsets for the time ID values.
PERIODS | SPANS
plots a histogram of the spans between adjacent time ID values.
VALUES
plots a panel of the counts, offsets, and spans for each of the time ID values.
ALL
is equivalent to specifying PLOT=(INTERVALCOUNTS SPANS OFFSETS VALUES).
See the section “Time ID Diagnostics” on page 1831 for details. PRINT=option | (options)
specifies the printed output desired. By default, the TIMEID procedure produces no printed output. The following printing options are available: COUNTS | INTCNTS | INTERVALCOUNTS prints a table that contains the counts of time ID values per interval. INTERVAL
prints a summary of information about the time interval.
OFFSETS
prints a table that contains the time offsets for the time ID values.
PERIODS | SPANS
prints tables that contain statistics on the spans between adjacent time ID values.
VALUES
prints tables that contain offset span and count information for the time ID values.
ALL
is equivalent to specifying PRINT=(INTERVALCOUNTS SPANS INTERVAL OFFSETS VALUES).
See the section “Time ID Diagnostics” on page 1831 for details.
BY Statement BY variables ;
A BY statement can be used with PROC TIMEID to obtain separate analyses for groups of observations defined by the BY variables.
ID Statement ID variable < options > ;
1830 F Chapter 28: The TIMEID Procedure (Experimental)
The ID statement names a numeric variable that identifies observations in the input and output data sets. The ID variable’s values are assumed to be SAS date or datetime values. The ID statement options specify how the time ID values are spaced and aligned relative to a SAS date or datetime interval. The INTERVAL= option specifies the fundamental spacing that is used as the basis for counting intervals, offsets, and spans in the data. Specification of the ID variable in an ID statement is required. ALIGN=alignment
specifies the alignment of the identifying SAS date or datetime that is used to represent intervals. The value of the ALIGN= option is used in the analysis of the time ID variable. The ALIGN= option accepts the following values: BEGINNING | BEG | B, MIDDLE | MID | M, ENDING | END | E, and INFER. For example, ALIGN=BEGIN specifies that the identifying date for the interval is the beginning date in the interval. If the ALIGN= option is not specified, then the default alignment is BEGIN. ALIGN=INFER specifies that the alignment of values within time intervals be inferred from the time ID values. DUPLICATES
specifies that multiple observations in the DATA= data set can fall within the same time interval as defined by the time ID variable. When this option is omitted and multiple time ID values are encountered in a single time interval, error messages are written to the SAS log. FORMAT=format
specifies the SAS format used for time ID values in the data sets and in printed and plotted output that is generated by PROC TIMEID. If the FORMAT= option is not specified, the format applied to the input time ID variable is used. If neither of these formats is specified, the format is inferred from the INTERVAL= option. INTERVAL=interval
specifies the proposed time interval and shift that describe the time ID values in the input data set. See Chapter 4, “Date Intervals, Formats, and Functions,” for more information about the intervals that can be specified. See the section “Time ID Diagnostics” on page 1831 for more information about how the INTERVAL= option determines the nature of diagnostic information reported by the TIMEID procedure. If no interval is specified, the procedure attempts to infer an interval from the input time ID values. See the section “Inferring Time Intervals and Alignments” on page 1833 for details about how the time interval is inferred. NOTSORTED
specifies that the observations in the DATA= data set are not sorted by the time ID variable. When this option is omitted, error messages are generated for time ID values that are not sorted in ascending order.
Details: TIMEID Procedure F 1831
Details: TIMEID Procedure
Time ID Diagnostics For a specified time interval, PROC TIMEID decomposes the raw time ID values in an input data set into the following three quantities, whose values are represented by nonnegative integers at each unique time ID value in the input series: interval counts
the number of observations that share each time interval in the data set.
offsets
the numerical difference between a time ID value and the aligned value for that time interval. The unit of measure used to express this distance is days for date values and seconds for datetime values. The offset is computed for each time ID value, ti , by using the following SAS expression: offseti D ti
spans
INTNX.interval; ti ; 0; alignment/
the number of intervals between each time ID value and the previous time ID value. The spans value is equivalent to the number returned by the following SAS expression: spansi D INTCK.interval; ti
1 ; ti /
Diagnostic Output Representation The TIMEID procedure produces time ID diagnostics as both time-ID-based and count-based frequency distributions to expose many of the possible problems that can occur in a time ID variable. The time-ID-based frequency distributions that are generated with the PLOT= option provide a detailed view of time ID values that can isolate problems with specific ID values. Figure 28.1 shows a time series that has a span of 10 observations in a weekday series based on the results of the PLOT=(VALUES SPANS) option. The single large bar in the spans plot shows where data are omitted.
1832 F Chapter 28: The TIMEID Procedure (Experimental)
Figure 28.1 Time ID Decomposition
The count-based frequency distributions summarize features of the time ID variable. Individual printed and plotted outputs are available to describe the distribution of the number of spans, offsets, and interval counts that occur in the time ID variable. Figure 28.2 illustrates a count-based frequency distribution of the spans within the weekday series.
Inferring Time Intervals and Alignments F 1833
Figure 28.2 Span Count Distribution
The large bar at the span of 1 shows that most of the observations are correctly separated by one interval. The bar at 11 indicates that one observation is separated by 11 intervals from the preceding value of the time ID variable. This further illustrates a span of 10 omitted observations.
Inferring Time Intervals and Alignments When the INTERVAL= option is not specified in the ID statement, a time interval is inferred from the time ID values in the input data set. The technique used to infer a time interval involves searching for the interval that fits the greatest number of time ID values. First, time ID values are sampled from the input data set to generate a set of candidate intervals. Then the candidate interval that is consistent with greatest number of time ID values is chosen to represent the time series. When the ALIGN=INFER option is specified, the convention that is used to specify time interval alignment is inferred from the time ID variable values by using a similar technique. When both the time interval and its alignment are to be inferred, each of the possible alignments, BEGIN, MIDDLE, and END, are considered in the search. Precedence in the search is given to intervals with the BEGIN alignment.
1834 F Chapter 28: The TIMEID Procedure (Experimental)
Data Set Output The TIMEID procedure creates the OUTFREQ=, OUTINTERVAL=, and OUTINTERVALDETAILS= data sets. The OUTFREQ= and OUTINTERVALDETAILS= data sets contain the variables that are specified in the BY statement along with variables that characterize the time ID values. The OUTINTERVAL= option creates a data set without BY variables. The information in this data set summarizes time ID diagnostic information across all BY groups in the DATA= data set.
OUTFREQ= Data Set The OUTFREQ= data set contains a single observation for each value of the time ID variable in the input data set for each BY group. Additionally, the following variables are written to the OUTFREQ= data set: _COUNT_
number of the occurrences of the time ID value
_PERCENT_
percentage of all time ID values
OUTINTERVAL= Data Set The OUTINTERVAL= data set contains information that is similar to the variables written to the OUTINTERVALDETAILS= data set; however, the OUTINTERVAL= data set summarizes the information across all BY groups into a single observation. The following variables are written to the OUTINTERVAL= data set: TIMEID
time ID variable
START
smallest time ID value
END
largest time ID value
STARTSHARED
largest starting time ID value
ENDSHARED
smallest ending time ID value
NOBS
number of observations
N
number of nonmissing observations
NMISS
number of missing observations
NBY
number of BY groups
NINVALID
number of invalid observations
STATUS
status flag that indicates whether the requested analyses were successful: 0
The analysis completed successfully.
4000
Inference of a time interval from the data set failed.
5000
Diagnosis of the DATA= data set for the specified time interval failed.
Data Set Output F 1835
MSG
a message that provides further details when the STATUS variable is not zero
INTERVAL
time interval that is specified or recommended
INTNAME
time interval base name that is specified or recommended
MULTIPLIER
time interval multiplier that is specified or recommended
SHIFT
time interval shift that is specified or recommended
ALIGNMENT
time interval alignment that is specified or recommended
SEASONALITY
seasonality determined from specified or recommended time interval
TOTALSEASONCYCLES SEASONCYCLESSHARED FORMAT
total number of seasonal cycles spanned by all the observations number of seasonal cycles that are shared among all BY groups
format of the time ID variable
OUTINTERVALDETAILS= Data Set The OUTINTERVALDETAILS= data set contains statistics about the time interval that is specified in the ID statement or inferred from the time ID values for each BY group. The following variables represent these statistics: TIMEID
time ID variable name
START
starting time ID value
END
ending time ID value
NOBS
number of observations
N
number of nonmissing observations
NMISS
number of missing observations
NINVALID
number of invalid observations
NINTCNTS
number of distinct interval count values
PCTINTCNTS
percentage of interval counts greater than one
MININTCNT
minimum of interval counts
MAXINTCNT
maximum of interval counts
MEANINTCNT
mean of interval counts
STDINTCNT
standard deviation of interval counts
MEDINTCNT
median of interval counts
NOFFSETS
number of time ID offset
PCTOFFSETS
percentage of time ID offset
MINOFFSET
minimum of time ID offsets
MAXOFFSET
maximum of time ID offsets
MEANOFFSET
mean of time ID offsets
STDOFFSET
standard deviation of time ID offsets
1836 F Chapter 28: The TIMEID Procedure (Experimental)
MEDOFFSET
median of time ID offsets
NSPANS
number of spans between time ID values
PCTSPANS
percentage of spans between time ID values
MINSPAN
maximum of spans between time ID values
MAXSPAN
minimum of spans between time ID values
MEANSPAN
mean of spans between time ID values
STDSPAN
standard deviation of spans between time ID values
MEDSPAN
median of spans between time ID values
STATUS
status flag that indicates whether the requested analyses were successful: 0
The analysis completed successfully.
4000
Inference of a time interval from the data set failed .
5000
Diagnosis of the DATA= data set for specified time interval failed.
MSG
a message that provides further details when the STATUS variable is not zero
INTERVAL
time interval specified or recommended
INTNAME
time interval base name specified or recommended
MULTIPLIER
time interval multiplier specified or recommended
SHIFT
time interval shift specified or recommended
ALIGNMENT
time interval alignment specified or recommended
SEASONALITY
seasonality determined from specified or recommended time interval
NSEASONCYCLES FORMAT
number of seasonal cycles spanned by the time ID values
format of the time ID variable
Printed Tabular Output The TIMEID procedure optionally produces printed output by using the Output Delivery System (ODS). By default, the procedure produces no printed output. The appearance of the printed tabular output is controlled by the PRINT= option in the PROC TIMEID statement. Table 28.2 relates the PRINT= options to the names of the ODS tables. Table 28.2
ODS Tables Produced in PROC TIMEID
ODS Name
Description
PRINT= Option
DataSet
Information about the input data set Time ID counts, offsets, and spans
ALL
Decomposition
VALUES
ODS Graphics F 1837
Table 28.2
(continued)
ODS Table Name
Description
PRINT= Option
Interval
Information about the time interval Frequency distribution of interval counts Statistics on interval count frequency distribution Frequency distribution of offsets Statistics on offset frequency distribution Frequency distribution of spans Statistics on the span frequency distribution Time ID value counts Summary of the number of valid observations
INTERVAL
IntervalCountsComponent IntervalCountsStatistics OffsetsComponent OffsetStatistics SpansComponent SpanStatistics Values ValueSummary
INTERVALCOUNTS INTERVALCOUNTS OFFSETS OFFSETS SPANS SPANS VALUES VALUES
ODS Graphics The TIMEID procedure uses ODS Graphics to produce plotted output as specified by the PLOT= option. Table 28.3 relates the PLOT= options to the names of the ODS Graphics objects. Table 28.3
ODS Graphics Produced by the PLOT= Option in PROC TIMEID
ODS Graph Name
Plot Description
PLOT= Option
DecompositionPlot
Panel of spans, offsets, and counts for each time interval Histogram of interval counts Plot of counts for each time interval value Histogram of time ID offsets Plot of offsets for each time interval value Histogram of span sizes between time ID values Plot of spans for each time interval value Plot of counts of each time ID value
VALUES
IntervalCountsComponentPlot IntervalCountsPlot OffsetComponentPlot OffsetsPlot SpanComponentPlot SpansPlot ValuesPlot
INTERVALCOUNTS VALUES OFFSETS VALUES SPANS VALUES VALUES
1838 F Chapter 28: The TIMEID Procedure (Experimental)
Examples: TIMEID Procedure
Example 28.1: Examining a Weekly Time ID Variable This example illustrates how problems in a weekly time series can be visualized and quantified using the TIMEID procedure’s diagnostic capabilities. The following DATA step creates a data set that contains time values spaced in three week intervals where some weeks have been skipped or duplicated and some have been recorded on different weekdays. data triweek; format date date.; input date : date. @@; datalines; 28DEC48 18JAN49 08FEB49 01MAR49 22MAR49 12APR49 03MAY49 24MAY49 17JUN49 05JUL49 26JUL49 16AUG49 06SEP49 27SEP49 18OCT49 08NOV49 ... more lines ...
The following TIMEID procedure statements generate an ODS display of the time series that characterizes interval counts, offsets, and spans in the time ID variable. proc timeid data=triweek print=all plot=all; id date interval=week3; run;
The Time ID decomposition listing and plot shown in Output 28.1.1 and Output 28.1.2 summarize how well the WEEK3 interval fits the time ID values by showing the number of counts, offsets, and spans for each time interval that is represented by the DATE variable. The listing in Output 28.1.1 has been truncated to include only the first 10 observations. The Time ID plots in Output 28.1.2 indicate that there are duplicated time ID values for a three-week time interval in the Counts plot. The duplicated time intervals have a Count value of 2. The Offsets plot shows which days in the 21 day cycle have been used to record each time interval in the series. The Spans plot records values of 2 for six time intervals where no observations were recorded in the previous interval. The three component plots are histogram summaries of the diagnostic quantities plotted against individual intervals in the decomposition plots. The component plots can be useful in diagnosing time series that contain many time intervals.
Example 28.1: Examining a Weekly Time ID Variable F 1839
Output 28.1.1 Time ID Decomposition Listing Time Component Value Index 1 2 3 4 5 6 7 8 9 10
Sun, 12 Sun, 2 Sun, 23 Sun, 13 Sun, 6 Sun, 27 Sun, 17 Sun, 8 Sun, 29 Sun, 19
Dec Jan Jan Feb Mar Mar Apr May May Jun
date
Offset
Span
Interval Count
1948 1949 1949 1949 1949 1949 1949 1949 1949 1949
16 16 16 16 16 16 16 16 19 16
. 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
Output 28.1.2 Time ID Decomposition Plot
Output 28.1.3 and Output 28.1.4 describe the distribution of counts of duplicated WEEK3 intervals in the TriWeek data set. For this data set there are 134 intervals that contain one DATE value, and 10 intervals that contain two DATE values.
1840 F Chapter 28: The TIMEID Procedure (Experimental)
Output 28.1.3 Time ID Interval Counts Listings The TIMEID Procedure Component Value Index
Interval Count
Frequency
Percentage
1 2
1 2
134 10
93.055556 6.944444
Statistics Summary
Minimum
Maximum
Mean
Standard Deviation
1
2
1.0694444
1.1004981
Output 28.1.4 Time ID Interval Counts Histogram
Example 28.1: Examining a Weekly Time ID Variable F 1841
The offsets diagnostics Output 28.1.5 and Output 28.1.6 show the distribution of days in the 21-day WEEK3 interval used to record the time intervals in the series. The observations in the TriWeek data set represent intervals with five different offsets from the beginning of the WEEK3 interval: 0, 16, 18, 19 and 20. The high prevalence of intervals with offset 16 indicates that the TriWeek data set would be represented better using the WEEK3.17 interval. Output 28.1.5 Time ID Offsets Listings The TIMEID Procedure Component Value Index
Offset
Frequency
Percentage
1 2 3 4 5
0 16 18 19 20
1 138 1 1 3
0.694444 95.833333 0.694444 0.694444 2.083333
Statistics Summary
Minimum
Maximum
Mean
Standard Deviation
0
20
16.006944
1.7006205
1842 F Chapter 28: The TIMEID Procedure (Experimental)
Output 28.1.6 Time ID Offsets Histogram
The span diagnostics Output 28.1.7 and Output 28.1.8 show the distribution of the span sizes between successive DATE values. The TriWeek data set has three different span sizes of widths 0, 1 and 2. Here one span corresponds to the width of a WEEK3 interval. Output 28.1.7 Time ID Span Listings The TIMEID Procedure Component Value Index
Span
Frequency
Percentage
1 2 3
0 1 2
1 135 6
0.704225 95.070423 4.225352
Example 28.1: Examining a Weekly Time ID Variable F 1843
Output 28.1.7 continued Statistics Summary
Minimum
Maximum
Mean
Standard Deviation
0
2
1.0352113
0.6367974
Output 28.1.8 Time ID Span Histogram
Output 28.1.9 and Output 28.1.10 show the distribution of time ID values before alignment to the WEEK3 interval. The listing in Output 28.1.9 has been truncated to include only the first 10 observations.
1844 F Chapter 28: The TIMEID Procedure (Experimental)
Output 28.1.9 Unaligned Time ID Listings Time ID Values for DATE Value Index 1 2 3 4 5 6 7 8 9 10
Tue, 28 Tue, 18 Tue, 8 Tue, 1 Tue, 22 Tue, 12 Tue, 3 Tue, 24 Fri, 17 Tue, 5
Dec Jan Feb Mar Mar Apr May May Jun Jul
Output 28.1.10 Unaligned Time ID Histogram
date
Frequency
Percentage
1948 1949 1949 1949 1949 1949 1949 1949 1949 1949
1 1 1 1 1 1 1 1 1 1
0.694444 0.694444 0.694444 0.694444 0.694444 0.694444 0.694444 0.694444 0.694444 0.694444
Example 28.2: Inferring a Date Interval F 1845
Example 28.2: Inferring a Date Interval This example illustrates how a time ID variable can be inferred from a data set when a sufficient number of obserations are present. data workdays; format day weekdate.; input day : date. @@; datalines; 01AUG09 06AUG09 11AUG09 14AUG09 19AUG09 22AUG09 27AUG09 01SEP09 04SEP09 09SEP09 12SEP09 17SEP09 ; proc timeid data=workdays print=interval; id day; run;
The 12 observations in the WorkDays data set are enough to determine that the DAY time ID variable is represented by the WEEKDAY12W3 interval. The WEEKDAY12W3 interval corresponds to every third day of the week excluding Sundays and Mondays. Characteristics of this interval are shown in Output 28.2.1. Output 28.2.1 Inferred Time Interval Information The TIMEID Procedure Time Interval Analysis Summary Time ID Variable Time Interval Base Name Multiplier Shift Length of Seasonal Cycle Time ID Format Start End
day WEEKDAY12W3 WEEKDAY 3 0 5 WEEKDATE Saturday, August 1, 2009 Thursday, September 17, 2009
1846 F Chapter 28: The TIMEID Procedure (Experimental)
Example 28.3: Examining Multiple BY Groups This example illustrates how a time ID variable can be examined independently over each BY group and summarized over all observations in the DATA= data set. data bygroups; format tid date.; input tid : date. by @@; datalines; ... more lines ...
The following TIMEID procedure statements generate two data sets that summarize a data set with four BY groups. proc timeid data=bygroups outintervaldetails=int outinterval=intsum; id tid; by by; run;
The summarized information in Output 28.3.1 shows that BY groups 2, 3, and 4 in the ByGroups data set contain some duplicate values and spans, and group 1 conforms exactly to the WEEKDAY17W interval. This listing also shows that the date ranges in these two BY groups start and end on different days and that they overlap between December 7, 2009, and December 28, 2009.
Example 28.3: Examining Multiple BY Groups F 1847
Output 28.3.1 Selected Variables in the Combined OUTINTERVALDETAILS= OUTINTERVAL= Data Sets
b y
N
N I N T C N T S
1 2 3 4 .
25 25 25 25 100
1 2 2 2 .
P C T I N T C N T S 0.00 0.08 0.16 0.24 .
N O F F S E T S
P C T O F F S E T S
1 1 1 1 .
0 0 0 0 .
N S P A N S
P C T S P A N S
S T A T U S
1 2 2 2 .
0.00000 0.00000 0.04348 0.13043 .
0 0 0 0 0
E N D
S E A S O N A L I T Y
N S E A S O N C Y C L E S
S T A R T S H A R E D
E N D S H A R E D
N B Y
T O T A L S E A S O N C Y C L E S
28DEC09 31DEC09 05JAN10 08JAN10 08JAN10
5 5 5 5 5
5 5 5 4 .
. . . . 07DEC09
. . . . 28DEC09
. . . . 4
. . . . 6
I N T E R V A L WEEKDAY17W WEEKDAY17W WEEKDAY17W WEEKDAY17W WEEKDAY17W S E A S O N C Y C L E S S H A R E D . . . . 3
S T A R T 24NOV09 27NOV09 02DEC09 07DEC09 24NOV09
1848
Chapter 29
The TIMESERIES Procedure Contents Overview: TIMESERIES Procedure . . . Getting Started: TIMESERIES Procedure Syntax: TIMESERIES Procedure . . . . Functional Summary . . . . . . . . PROC TIMESERIES Statement . . BY Statement . . . . . . . . . . . CORR Statement . . . . . . . . . . CROSSCORR Statement . . . . . DECOMP Statement . . . . . . . . ID Statement . . . . . . . . . . . . SEASON Statement . . . . . . . . SPECTRA Statement . . . . . . . SSA Statement . . . . . . . . . . . TREND Statement . . . . . . . . . VAR and CROSSVAR Statements . Details: TIMESERIES Procedure . . . . Accumulation . . . . . . . . . . . Missing Value Interpretation . . . . Time Series Transformation . . . . Time Series Differencing . . . . . Descriptive Statistics . . . . . . . . Seasonal Decomposition . . . . . . Correlation Analysis . . . . . . . . Cross-Correlation Analysis . . . . Spectral Density Analysis . . . . . Singular Spectrum Analysis . . . . Data Set Output . . . . . . . . . . OUT= Data Set . . . . . . . . . . . OUTCORR= Data Set . . . . . . . OUTCROSSCORR= Data Set . . . OUTDECOMP= Data Set . . . . . OUTSEASON= Data Set . . . . . OUTSPECTRA= Data Set . . . . . OUTSSA= Data Set . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. .
.
.
. .
1850 1851 1854 1854 1857 1860 1861 1862 1863 1865 1868 1869 1871 1873 1874 1876 1876 1879 1879 1880 1880 1881 1883 1884 1885 1888 1890 1891 1891 1892 1893 1894 1895 1895
1850 F Chapter 29: The TIMESERIES Procedure
OUTSUM= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . OUTTREND= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . _STATUS_ Variable Values . . . . . . . . . . . . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . ODS Graphics Names . . . . . . . . . . . . . . . . . . . . . . . . . . Examples: TIMESERIES Procedure . . . . . . . . . . . . . . . . . . . . . . Example 29.1: Accumulating Transactional Data into Time Series Data Example 29.2: Trend and Seasonal Analysis . . . . . . . . . . . . . . Example 29.3: Illustration of ODS Graphics . . . . . . . . . . . . . . Example 29.4: Illustration of Spectral Analysis . . . . . . . . . . . . . Example 29.5: Illustration of Singular Spectrum Analysis . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
.
. . . .
1896 1897 1898 1898 1899 1899 1901 1901 1902 1907 1911 1913 1916
Overview: TIMESERIES Procedure The TIMESERIES procedure analyzes time-stamped transactional data with respect to time and accumulates the data into a time series format. The procedure can perform trend and seasonal analysis on the transactions. After the transactional data are accumulated, time domain and frequency domain analysis can be performed on the accumulated time series. For seasonal analysis of the transaction data, various statistics can be computed for each season. For trend analysis of the transaction data, various statistics can be computed for each time period. The analysis is similar to applying the MEANS procedure of Base SAS software to each season or time period of concern. After the transactional data are accumulated to form a time series and any missing values are interpreted, the accumulated time series can be functionally transformed using log, square root, logistic, or Box-Cox transformations. The time series can be further transformed using simple and/or seasonal differencing. After functional and difference transformations have been applied, the accumulated and transformed time series can be stored in an output data set. This working time series can then be analyzed further using various time series analysis techniques provided by this procedure or other SAS/ETS procedures. Time series analyses performed by the TIMESERIES procedure include: descriptive (global) statistics seasonal decomposition/adjustment analysis correlation analysis cross-correlation analysis spectral analysis
Getting Started: TIMESERIES Procedure F 1851
All results of the transactional or time series analysis can be stored in output data sets or printed using the Output Delivery System (ODS). The TIMESERIES procedure can process large amounts of time-stamped transactional data. Therefore, the analysis results are useful for large-scale time series analysis or (temporal) data mining. All of the results can be stored in output data sets in either a time series format (default) or in a coordinate format (transposed). The time series format is useful for preparing the data for subsequent analysis with other SAS/ETS procedures. For example, the working time series can be further analyzed, modeled, and forecast with other SAS/ETS procedures. The coordinate format is useful when using this procedure with SAS/STAT procedures or SAS Enterprise Miner. For example, clustering time-stamped transactional data can be achieved by using the results of this procedure with the clustering procedures of SAS/STAT and the nodes of SAS Enterprise Miner. The EXPAND procedure can be used for the frequency conversion and transformations of time series output from this procedure.
Getting Started: TIMESERIES Procedure This section outlines the use of the TIMESERIES procedure and gives a cursory description of some of the analysis techniques that can be performed on time-stamped transactional data. Given an input data set that contains numerous transaction variables recorded over time at no specific frequency, the TIMESERIES procedure can form time series as follows: PROC TIMESERIES DATA= OUT=; ID INTERVAL= ACCUMULATE=<statistic>; VAR ; RUN;
The TIMESERIES procedure forms time series from the input time-stamped transactional data. It can provide results in output data sets or in other output formats by using the Output Delivery System (ODS). Time-stamped transactional data are often recorded at no fixed interval. Analysts often want to use time series analysis techniques that require fixed-time intervals. Therefore, the transactional data must be accumulated to form a fixed-interval time series. Suppose that a bank wants to analyze the transactions associated with each of its customers over time. Further, suppose that the data set WORK.TRANSACTIONS contains four variables that are related to these transactions: CUSTOMER, DATE, WITHDRAWAL, and DEPOSITS. The following examples illustrate possible ways to analyze these transactions by using the TIMESERIES procedure. To accumulate the time-stamped transactional data to form a daily time series based on the accumulated daily totals of each type of transaction (WITHDRAWALS and DEPOSITS ), the following TIMESERIES procedure statements can be used:
1852 F Chapter 29: The TIMESERIES Procedure
proc timeseries data=transactions out=timeseries; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
The OUT=TIMESERIES option specifies that the resulting time series data for each customer is to be stored in the data set WORK.TIMESERIES. The INTERVAL=DAY option specifies that the transactions are to be accumulated on a daily basis. The ACCUMULATE=TOTAL option specifies that the sum of the transactions is to be calculated. After the transactional data is accumulated into a time series format, many of the procedures provided with SAS/ETS software can be used to analyze the resulting time series data. For example, the ARIMA procedure can be used to model and forecast each customer’s withdrawal data by using an ARIMA(0,1,1)(0,1,1)s model (where the number of seasons is s=7 days in a week) using the following statements: proc arima data=timeseries; identify var=withdrawals(1,7) noprint; estimate q=(1)(7) outest=estimates noprint; forecast id=date interval=day out=forecasts; quit;
The OUTEST=ESTIMATES data set contains the parameter estimates of the model specified. The OUT=FORECASTS data set contains forecasts based on the model specified. See the SAS/ETS ARIMA procedure for more detail. A single set of transactions can be very large and must be summarized in order to analyze them effectively. Analysts often want to examine transactional data for trends and seasonal variation. To analyze transactional data for trends and seasonality, statistics must be computed for each time period and season of concern. For each observation, the time period and season must be determined and the data must be analyzed based on this determination. The following statements illustrate how to use the TIMESERIES procedure to perform trend and seasonal analysis of time-stamped transactional data. proc timeseries data=transactions out=out outseason=season outtrend=trend; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
Since the INTERVAL=DAY option is specified, the length of the seasonal cycle is seven (7) where the first season is Sunday and the last season is Saturday. The output data set specified by the OUTSEASON=SEASON option contains the seasonal statistics for each day of the week by each customer. The output data set specified by the OUTTREND=TREND option contains the trend statistics for each day of the calendar by each customer.
Getting Started: TIMESERIES Procedure F 1853
Often it is desired to seasonally decompose into seasonal, trend, cycle, and irregular components or to seasonally adjust a time series. The following techniques describe how the changing seasons influence the time series. The following statements illustrate how to use the TIMESERIES procedure to perform seasonal adjustment/decomposition analysis of time-stamped transactional data. proc timeseries data=transactions out=out outdecomp=decompose; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
The output data set specified by the OUTDECOMP=DECOMPOSE data set contains the decomposed/adjusted time series for each customer. A single time series can be very large. Often, a time series must be summarized with respect to time lags in order to be efficiently analyzed using time domain techniques. These techniques help describe how a current observation is related to the past observations with respect to the time (season) lag. The following statements illustrate how to use the TIMESERIES procedure to perform time domain analysis of time-stamped transactional data. proc timeseries data=transactions out=out outcorr=timedomain; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
The output data set specified by the OUTCORR=TIMEDOMAIN data set contains the time domain statistics, such as sample autocorrelations and partial autocorrelations, by each customer. Sometimes time series data contain underlying patterns that can be identified using spectral analysis techniques. Two kinds of spectral analyses on univariate data can be performed using the TIMESERIES procedure. They are singular spectrum analysis and Fourier spectral analysis. Singular spectrum analysis (SSA) is a technique for decomposing a time series into additive components and categorizing these components based on the magnitudes of their contributions. SSA uses a single parameter, the window length, to quantify patterns in a time series without relying on prior information about the series’ structure. The window length represents the maximum lag that is considered in the analysis, and it corresponds to the dimensionality of the principle components analysis (PCA) on which SSA is based. The components are combined into groups to categorize their roles in the SSA decomposition. Fourier spectral analysis decomposes a time series into a sum of harmonics. In the discrete Fourier transform, the contribution of components at evenly spaced frequencies are quantified in a periodogram and summarized in spectral density estimates.
1854 F Chapter 29: The TIMESERIES Procedure
The following statements illustrate how to use the TIMESERIES procedure to analyze time-stamped transactional data without prior information about the series’ structure. proc timeseries data=transactions outssa=ssa outspectra=spectra; by customer; id date interval=day accumulate=total; var withdrawals deposits; run;
The output data set specified by the OUTSSA=SSA data set contains a singular spectrum analysis of the withdrawals and deposits data. The data set specified by OUTSPECTRA=SPECTRA contains a Fourier spectral decomposition of the same data. By default, the TIMESERIES procedure produces no printed output.
Syntax: TIMESERIES Procedure THe TIMESERIES Procedure uses the following statements: PROC TIMESERIES options ; BY variables ; CORR statistics-list / options ; CROSSCORR statistics-list / options ; CROSSVAR variable-list / options ; DECOMP component-list / options ; ID variable INTERVAL= interval-option ; SEASON statistics-list / options ; SPECTRA statistics-list / options ; SSA / options ; TREND statistics-list / options ; VAR variable-list / options ;
Functional Summary Table 29.1 summarizes the statements and options that control the TIMESERIES procedure. Table 29.1
TIMESERIES Functional Summary
Description
Statement
Statements Specifies BY-group processing
BY
Option
Functional Summary F 1855
Description
Statement
Specifies variables to analyze Specifies cross variables to analyze Specifies the time ID variable Specifies correlation options Specifies cross-correlation options Specifies decomposition options Specifies seasonal statistics options Specifies spectral analysis options Specifies SSA options Specifies trend statistics options
VAR CROSSVAR ID CORR CROSSCORR DECOMP SEASON SPECTRA SSA TREND
Data Set Options Specifies the input data set Specifies the output data set Specifies the correlations output data set Specifies the cross-correlations output data set Specifies the decomposition output data set Specifies the seasonal statistics output data set Specifies the spectral analysis output data set Specifies the SSA output data set Specifies the summary statistics output data set Specifies the trend statistics output data set Accumulation and Seasonality Options Specifies the accumulation frequency Specifies the length of seasonal cycle Specifies the interval alignment Specifies the interval boundary alignment Specifies that time ID variable values not be sorted Specifies the starting time ID value Specifies the ending time ID value Specifies the accumulation statistic Specifies missing value interpretation Time-Stamped Data Seasonal Statistics Options Specifies the form of the output data set
Option
PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES PROC TIMESERIES
DATA= OUT= OUTCORR= OUTCROSSCORR= OUTDECOMP= OUTSEASON= OUTSPECTRA= OUTSSA= OUTSUM=
PROC TIMESERIES
OUTTREND=
ID PROC TIMESERIES ID ID ID
INTERVAL= SEASONALITY= ALIGN= BOUNDARYALIGN= NOTSORTED
ID ID ID, VAR, CROSSVAR ID, VAR, CROSSVAR
START= END= ACCUMULATE= SETMISSING=
SEASON
TRANSPOSE=
1856 F Chapter 29: The TIMESERIES Procedure
Description
Statement
Option
Fourier Spectral Analysis Options Specifies whether to adjust to the series mean Specifies confidence limits Specifies the kernel weighting function
SPECTRA SPECTRA SPECTRA SPECTRA
ADJMEAN ALPHA= PARZEN | BART | TUK | TRUNC | QS DOMAIN=
SPECTRA SPECTRA SPECTRA
C= EXPON= WEIGHTS
SSA
GROUPS=
SSA SSA
LENGTH= NPERIODS=
SSA
THRESHOLDPCT
SSA
TRANSPOSE=
Specifies the domain where kernel functions apply Specifies the constant bandwidth parameter Specifies the exponent kernel parameter Specifies the periodogram weights Singular Spectrum Analysis Options Specifies the grouping of principal components Specifies the window length Specifies the number of time periods in the transposed output Specifies the division between principal component groupings Specifies that the output be transposed
Time-Stamped Data Trend Statistics Options Specifies the form of the output data set TREND Specifies the number of time periods to be TREND stored
TRANSPOSE= NPERIODS=
Time Series Transformation Options Specifies simple differencing Specifies seasonal differencing Specifies transformation
VAR, CROSSVAR VAR, CROSSVAR VAR, CROSSVAR
DIF= SDIF= TRANSFORM=
Time Series Correlation Options Specifies the list of lags Specifies the number of lags Specifies the number of parameters Specifies the form of the output data set
CORR CORR CORR CORR
LAGS= NLAG= NPARMS= TRANSPOSE=
Time Series Cross-Correlation Options Specifies the list of lags Specifies the number of lags Specifies the form of the output data set
CROSSCORR CROSSCORR CROSSCORR
LAGS= NLAG= TRANSPOSE=
Time Series Decomposition Options Specifies the mode of decomposition Specifies the Hodrick-Prescott filter parameter
DECOMP DECOMP
MODE= LAMBDA=
PROC TIMESERIES Statement F 1857
Description
Statement
Option
Specifies the number of time periods to be stored Specifies the form of the output data set
DECOMP
NPERIODS=
DECOMP
TRANSPOSE=
ID PROC TIMESERIES PROC TIMESERIES
FORMAT= PRINT= PRINTDETAILS
PROC TIMESERIES
SORTNAMES
PROC TIMESERIES
MAXERROR=
PROC TIMESERIES PROC TIMESERIES
CROSSPLOTS= PLOTS=
Printing Control Options Specifies the time ID format Specifies which output to print Specifies that detailed output be printed Miscellaneous Options Specifies that analysis variables be processed in sorted order Limits error and warning messages ODS Graphics Options Specifies the cross-variable graphical output Specifies the variable graphical output
PROC TIMESERIES Statement PROC TIMESERIES options ;
The following options can be used in the PROC TIMESERIES statement: DATA= SAS-data-set
names the SAS data set that contains the input data for the procedure to create the time series. If the DATA= option is not specified, the most recently created SAS data set is used. CROSSPLOTS= option | ( options )
specifies the cross-variable graphical output desired. By default, the TIMESERIES procedure produces no graphical output. The following plotting options are available: SERIES
plots the time series (OUT= data set).
CCF
plots the cross-correlation functions (OUTCROSSCORR= data set).
ALL
same as PLOTS=(SERIES CCF).
For example, CROSSPLOTS=SERIES plots the two time series. The CROSSPLOTS= option produces graphical output for these results by using the Output Delivery System (ODS). The CROSSPLOTS= option produces results similar to the data sets listed in parentheses next to the preceding options. MAXERROR= number
limits the number of warning and error messages that are produced during the execution of the
1858 F Chapter 29: The TIMESERIES Procedure
procedure to the specified value. The default is MAXERRORS=50. This option is particularly useful in BY-group processing where it can be used to suppress the recurring messages. OUT= SAS-data-set
names the output data set to contain the time series variables specified in the subsequent VAR and CROSSVAR statements. If BY variables are specified, they are also included in the OUT= data set. If an ID variable is specified, it is also included in the OUT= data set. The values are accumulated based on the ID statement INTERVAL= or the ACCUMULATE= option or both. The OUT= data set is particularly useful when you want to further analyze, model, or forecast the resulting time series with other SAS/ETS procedures. OUTCORR= SAS-data-set
names the output data set to contain the univariate time domain statistics. OUTCROSSCORR= SAS-data-set
names the output data set to contain the cross-correlation statistics. OUTDECOMP= SAS-data-set
names the output data set to contain the decomposed and/or seasonally adjusted time series. OUTSEASON= SAS-data-set
names the output data set to contain the seasonal statistics. The statistics are computed for each season as specified by the ID statement INTERVAL= option or the PROC TIMESERIES statement SEASONALITY= option. The OUTSEASON= data set is particularly useful when analyzing transactional data for seasonal variations. OUTSPECTRA= SAS-data-set
names the output data set to contain the univariate frequency domain analysis results. OUTSSA= SAS-data-set
names the output data set to contain the singular spectrum analysis result series. OUTSUM= SAS-data-set
names the output data set to contain the descriptive statistics. The descriptive statistics are based on the accumulated time series when the ACCUMULATE= and/or SETMISSING= options are specified in the ID or VAR statements. The OUTSUM= data set is particularly useful when analyzing large numbers of series and a summary of the results are needed. OUTTREND= SAS-data-set
names the output data set to contain the trend statistics. The statistics are computed for each time period as specified by the ID statement INTERVAL= option. The OUTTREND= data set is particularly useful when analyzing transactional data for trends. PLOTS= option | ( options )
specifies the univariate graphical output desired. By default, the TIMESERIES procedure produces no graphical output. The following plotting options are available: SERIES
plots the time series (OUT= data set).
RESIDUAL
plots the residual time series (OUT= data set).
PROC TIMESERIES Statement F 1859
CYCLES
plots the seasonal cycles (OUT= data set).
CORR
plots the correlation panel (OUTCORR= data set).
ACF
plots the autocorrelation function (OUTCORR= data set).
PACF
plots the partial autocorrelation function (OUTCORR= data set).
IACF
plots the inverse autocorrelation function (OUTCORR= data set).
WN
plots the white noise probabilities (OUTCORR= data set).
DECOMP
plots the seasonal adjustment panel (OUTDECOMP= data set).
TCS
plots the trend-cycle-seasonal component (OUTDECOMP= data set).
TCC
plots the trend-cycle component (OUTDECOMP= data set).
SIC
plots the seasonal-irregular component (OUTDECOMP= data set).
SC
plots the seasonal component (OUTDECOMP= data set).
SA
plots the seasonal adjusted component (OUTDECOMP= data set).
PCSA
plots the percent change in the seasonal adjusted component (OUTDECOMP= data set).
IC
plots the irregular component (OUTDECOMP= data set).
TC
plots the trend component (OUTDECOMP= data set).
CC
plots the cycle component (OUTDECOMP= data set).
PERIODOGRAM
plots the periodogram (OUTSPECTRA= data set).
SPECTRUM
plots the spectral density estimate (OUTSPECTRA= data set).
SSA
plots the singular spectrum analysis results (OUTSSA= data set).
ALL
same as PLOTS=(SERIES ACF PACF IACF WN SSA).
For example, PLOTS=SERIES plots the time series. The PLOTS= option produces graphical output for these results by using the Output Delivery System (ODS). The PLOTS= option produces results similar to the data sets listed in parentheses next to the preceding options. PRINT= option | ( options )
specifies the printed output desired. By default, the TIMESERIES procedure produces no printed output. The following printing options are available: DECOMP
prints the seasonal decomposition/adjustment table (OUTDECOMP= data set).
SEASONS
prints the seasonal statistics table (OUTSEASON= data set).
DESCSTATS
prints the descriptive statistics for the accumulated time series (OUTSUM= data set).
SUMMARY
prints the descriptive statistics table for all time series (OUTSUM= data set).
TRENDS
prints the trend statistics table (OUTTREND= data set).
1860 F Chapter 29: The TIMESERIES Procedure
SSA
prints the singular spectrum analysis results (OUTSSA= data set).
ALL
same as PRINT=(DESCSTATS SUMMARY).
For example, PRINT=SEASONS prints the seasonal statistics. The PRINT= option produces printed output for these results by using the Output Delivery System (ODS). The PRINT= option produces results similar to the data sets listed in parentheses next to the preceding options. PRINTDETAILS
specifies that output requested with the PRINT= option be printed in greater detail. SEASONALITY= number
specifies the length of the seasonal cycle. For example, SEASONALITY=3 means that every group of three time periods forms a seasonal cycle. By default, the length of the seasonal cycle is one (no seasonality) or the length implied by the INTERVAL= option specified in the ID statement. For example, INTERVAL=MONTH implies that the length of the seasonal cycle is 12. SORTNAMES
specifies that the variables specified in the VAR and CROSSVAR statements be processed in sorted order by the variable names. This option allows the output data sets to be presorted by the variable names.
BY Statement A BY statement can be used with PROC TIMESERIES to obtain separate dummy variable definitions for groups of observations defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If your input data set is not sorted in ascending order, use one of the following alternatives: Sort the data by using the SORT procedure with a similar BY statement. Specify the option NOTSORTED or DESCENDING in the BY statement for the TIMESERIES procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order. Create an index on the BY variables by using the DATASETS procedure. For more information about the BY statement, see SAS Language Reference: Concepts. For more information about the DATASETS procedure, see the discussion in the Base SAS Procedures Guide.
CORR Statement F 1861
CORR Statement CORR statistics < / options > ;
A CORR statement can be used with the TIMESERIES procedure to specify options related to time domain analysis of the accumulated time series. Only one CORR statement is allowed. The following time domain statistics are available: LAG
time lag
N
number of variance products
ACOV
autocovariances
ACF
autocorrelations
ACFSTD
autocorrelation standard errors
ACF2STD
an indicator of whether autocorrelations are less than (–1), greater than (1), or within (0) two standard errors of zero
ACFNORM
normalized autocorrelations
ACFPROB
autocorrelation probabilities
ACFLPROB
autocorrelation log probabilities
PACF
partial autocorrelations
PACFSTD
partial autocorrelation standard errors
PACF2STD
an indicator of whether partial autocorrelation are less than (–1), greater than (1), or within (0) two standard errors of zero
PACFNORM
partial normalized autocorrelations
PACFPROB
partial autocorrelation probabilities
PACFLPROB
partial autocorrelation log probabilities
IACF
inverse autocorrelations
IACFSTD
inverse autocorrelation standard errors
IACF2STD
an indicator of whether the inverse autocorrelation is less than (–1), greater than (1) or within (0) two standard errors of zero
IACFNORM
normalized inverse autocorrelations
IACFPROB
inverse autocorrelation probabilities
IACFLPROB
inverse autocorrelation log probabilities
WN
white noise test statistics
WNPROB
white noise test probabilities
WNLPROB
white noise test log probabilities
If none of the correlation statistics are specified, the default is as follows:
1862 F Chapter 29: The TIMESERIES Procedure
corr lag n acov acf acfstd pacf pacfstd iacf iacfstd wn wnprob;
The following options can be specified in the CORR statement following the slash (/): NLAG= number
specifies the number of lags to be stored in the OUTCORR= data set or to be plotted. The default is 24 or three times the length of the seasonal cycle, whichever is smaller. The LAGS= option takes precedence over the NLAG= option. LAGS= (numlist)
specifies the list of lags to be stored in OUTCORR= data set or to be plotted. The list of lags must be separated by spaces or commas. For example, LAGS=(1,3) specifies the first then third lag. NPARMS= number
specifies the number of parameters used in the model that created the residual time series. The number of parameters determines the degrees of freedom associated with the Ljung-Box statistics. The default is NPARMS=0. This option is useful when analyzing the residuals of a time series model with the number of parameters specified by NPARMS=number option. TRANSPOSE= NO|YES
specifies which values are recorded as column names in the OUTCORR= data set. TRANSPOSE=YES specifies that lags be recorded as the column names instead of correlation statistics as the column names. The TRANSPOSE=NO option is useful for graphing the correlation results with SAS/GRAPH procedures. The TRANSPOSE=YES option is useful for analyzing the correlation results with other SAS procedures such as the CLUSTER procedure of SAS/STAT or SAS Enterprise Miner software. The default is TRANSPOSE=NO.
CROSSCORR Statement CROSSCORR statistics < / options > ;
A CROSSCORR statement can be used with the TIMESERIES procedure to specify options that are related to cross-correlation analysis of the accumulated time series. Only one CROSSCORR statement is allowed. The following time domain statistics are available: LAG
time lag
N
number of variance products
CCOV
cross covariances
CCF
cross-correlations
CCFSTD
cross-correlation standard errors
CCF2STD
an indicator of whether cross-correlations are less than (–1), greater than (1), or within (0) two standard errors of zero
DECOMP Statement F 1863
CCFNORM
normalized cross-correlations
CCFPROB
cross-correlation probabilities
CCFLPROB
cross-correlation log probabilities
If none of the cross-correlation statistics are specified, the default is as follows: crosscorr lag n ccov ccf ccfstd;
The following options can be specified in the CROSSCORR statement following the slash (/): NLAG= number
specifies the number of lags to be stored in the OUTCROSSCORR= data set or to be plotted. The default is 24 or three times the length of the seasonal cycle, whichever is smaller. The LAGS= option takes precedence over the NLAG= option. LAGS=( numlist )
specifies a list of lags to be stored in OUTCROSSCORR= data set or to be plotted. The list of lags must be separated by spaces or commas. For example, LAGS=(1,3) specifies the first then third lag. TRANSPOSE= NO|YES
specifies which values are recorded as column names in the OUTCROSSCORR= data set. TRANSPOSE=YES specifies that the lags be recorded as the column names instead of the cross-correlation statistics. The TRANSPOSE=NO option is useful for graphing the crosscorrelation results with SAS/GRAPH procedures. The TRANSPOSE=YES option is useful for analyzing the cross-correlation results with other procedures such as the CLUSTER procedure of SAS/STAT or SAS Enterprise Miner software. The default is TRANSPOSE=NO.
DECOMP Statement DECOMP components < / options > ;
A DECOMP statement can be used with the TIMESERIES procedure to specify options related to classical seasonal decomposition of the time series data. Only one DECOMP statement is allowed. The options specified affect all variables listed in the VAR statements. Decomposition can be performed only when the length of the seasonal cycle specified by the PROC TIMESERIES statement SEASONALITY= option or implied by the ID statement INTERVAL= option is greater than one. The following seasonal decomposition components are available: ORIG | ORIGINAL
original series
TCC | TRENDCYCLE
trend-cycle component
SIC | SEASONIRREGULAR
seasonal-irregular component
1864 F Chapter 29: The TIMESERIES Procedure
SC | SEASONAL
seasonal component
SCSTD
seasonal component standard errors
TCS | TRENDCYCLESEASON
trend-cycle-seasonal component
IC | IRREGULAR
irregular component
SA | ADJUSTED
seasonally adjusted series
PCSA
percent change seasonally adjusted series
TC
trend component
CC | CYCLE
cycle component
If none of the components are specified, the default is as follows: decomp orig tcc sc ic sa;
The following options can be specified in the DECOMP statement following the slash (/): MODE= option
specifies the type of decomposition to be used to decompose the time series. The following values can be specified for the MODE= option: ADD | ADDITIVE
additive decomposition
MULT | MULTIPLICATIVE
multiplicative decomposition
LOGADD | LOGADDITIVE
log-additive decomposition
PSEUDOADD | PSEUDOADDITIVE
pseudo-additive decomposition
MULTORADD
multiplicative or additive decomposition, depending on data
Multiplicative and log additive decomposition require strictly positive time series. If the accumulated time series contains nonpositive values and the MODE=MULT or MODE=LOGADD option is specified, an error results. Pseudo-additive decomposition requires a nonnegativevalued time series. If the accumulated time series contains negative values and the MODE=PSEUDOADD option is specified, an error results. The MODE=MULTORADD option specifies that multiplicative decomposition be used when the accumulated time series contains only positive values, that pseudo-additive decomposition be used when the accumulated time series contains only nonnegative values, and that additive decomposition be used otherwise. The default is MODE=MULTORADD. LAMBDA= number
specifies the Hodrick-Prescott filter parameter for trend-cycle decomposition. The default is LAMBDA=1600. Filtering applies when the trend component or the cycle component is requested. If filtering is not specified, this option is ignored. NPERIODS= number
specifies the number of time periods to be stored in the OUTDECOMP= data set when the TRANSPOSE=YES option is specified. If the TRANSPOSE=NO option is specified, the
ID Statement F 1865
NPERIODS= option is ignored. If the NPERIODS= option is positive, the first or beginning time periods are recorded. If the NPERIODS= option is negative, the last or ending time periods are recorded. The NPERIODS= option specifies the number of OUTDECOMP= data set variables to contain the seasonal decomposition and is therefore limited to the maximum allowed number of SAS variables. If the number of time periods exceeds this limit, a warning is printed in the log and the number of periods stored is reduced to the limit. If the NPERIODS= option is not specified, all of the periods specified between the ID statement START= and END= options are stored. If at least one of the START= or END= options is not specified, the default magnitude is the seasonality specified by the SEASONALITY= option in the PROC TIMESERIES statement or implied by the INTERVAL= option in the ID statement. If only the START= option or both the START= and END= options are specified and the seasonality is zero, the default is NPERIODS=5. If only the END= option or neither the START= nor END= option is specified and the seasonality is zero, the default is NPERIODS=– 5. TRANSPOSE= NO | YES
specifies which values are recorded as column names in the OUTDECOMP= data set. TRANSPOSE=YES specifies that the time periods be recorded as the column names instead of the statistics. The first and last time periods stored in the OUTDECOMP= data set correspond to the period of the ID statement START= option and END= option, respectively. If only the ID statement END= option is specified, the last time ID value of each accumulated time series corresponds to the last time period column. If only the ID statement START= option is specified, the first time ID value of each accumulated time series corresponds to the first time period column. If neither the START= option nor the END= option is specified with the ID statement, the first time ID value of each accumulated time series corresponds to the first time period column. The TRANSPOSE=NO option is useful for analyzing or displaying the decomposition results with SAS/GRAPH procedures. The TRANSPOSE=YES option is useful for analyzing the decomposition results with other SAS procedures or SAS Enterprise Miner software. The default is TRANSPOSE=NO.
ID Statement ID variable INTERVAL=interval < options > ;
The ID statement names a numeric variable that identifies observations in the input and output data sets. The ID variable’s values are assumed to be SAS date or datetime values. In addition, the ID statement specifies the (desired) frequency associated with the time series. The ID statement options also specify how the observations are accumulated and how the time ID values are aligned to form the time series. The information specified affects all variables listed in subsequent VAR statements. If the ID statement is specified, the INTERVAL= must also be used. If an ID statement is not specified, the observation number, with respect to the BY group, is used as the time ID. The following options can be used with the ID statement: ACCUMULATE= option
specifies how the data set observations are to be accumulated within each time period. The
1866 F Chapter 29: The TIMESERIES Procedure
frequency (width of each time interval) is specified by the INTERVAL= option. The ID variable contains the time ID values. Each time ID variable value corresponds to a specific time period. The accumulated values form the time series, which is used in subsequent analysis. The ACCUMULATE= option is useful when there are zero or more than one input observations that coincide with a particular time period (for example, time-stamped transactional data). The EXPAND procedure offers additional frequency conversions and transformations that can also be useful in creating a time series. The following options determine how the observations are accumulated within each time period based on the ID variable and the frequency specified by the INTERVAL= option: NONE
No accumulation occurs; the ID variable values must be equally spaced with respect to the frequency. This is the default option.
TOTAL
Observations are accumulated based on the total sum of their values.
AVERAGE | AVG
Observations are accumulated based on the average of their values.
MINIMUM | MIN
Observations are accumulated based on the minimum of their values.
MEDIAN | MED
Observations are accumulated based on the median of their values.
MAXIMUM | MAX
Observations are accumulated based on the maximum of their values.
N
Observations are accumulated based on the number of nonmissing observations.
NMISS
Observations are accumulated based on the number of missing observations.
NOBS
Observations are accumulated based on the number of observations.
FIRST
Observations are accumulated based on the first of their values.
LAST
Observations are accumulated based on the last of their values.
STDDEV |STD
Observations are accumulated based on the standard deviation of their values.
CSS
Observations are accumulated based on the corrected sum of squares of their values.
USS
Observations are accumulated based on the uncorrected sum of squares of their values.
If the ACCUMULATE= option is specified, the SETMISSING= option is useful for specifying how accumulated missing values are to be treated. If missing values should be interpreted as zero, then SETMISSING=0 should be used. The section “Details: TIMESERIES Procedure” on page 1876 describes accumulation in greater detail. ALIGN= option
controls the alignment of SAS dates used to identify output observations. The ALIGN= option accepts the following values: BEGINNING | BEG | B, MIDDLE | MID | M, and ENDING | END | E. BEGINNING is the default.
ID Statement F 1867
BOUNDARYALIGN= option
controls how the ACCUMULATE= option is processed for the two boundary time intervals, which include the START= and END= time ID values. Some time ID values might fall inside the first and last accumulation intervals but fall outside the START= and END= boundaries. In these cases the BOUNDARYALIGN= option determines which values to include in the accumulation operation. You can specify the following options: NONE
No values outside the START= and END= boundaries are accumulated.
START
All observations in the first time interval are accumulated.
END
All observations in the last time interval are accumulated.
BOTH
All observations in the first and last are accumulated.
If no option is specified, the default value BOUNDARYALIGN=NONE is used. The section “Details: TIMESERIES Procedure” on page 1876 describes the BOUNDARYALIGN= accumulation option in greater detail. END= option
specifies a SAS date or datetime value that represents the end of the data. If the last time ID variable value is less than the END= value, the series is extended with missing values. If the last time ID variable value is greater than the END= value, the series is truncated. For example, END=“&sysdate”D uses the automatic macro variable SYSDATE to extend or truncate the series to the current date. The START= and END= options can be used to ensure that data associated within each BY group contains the same number of observations. FORMAT= format
specifies the SAS format for the time ID values. If the FORMAT= option is not specified, the default format is implied from the INTERVAL= option. INTERVAL= interval
specifies the frequency of the accumulated time series. For example, if the input data set consists of quarterly observations, then INTERVAL=QTR should be used. If the PROC TIMESERIES statement SEASONALITY= option is not specified, the length of the seasonal cycle is implied from the INTERVAL= option. For example, INTERVAL=QTR implies a seasonal cycle of length 4. If the ACCUMULATE= option is also specified, the INTERVAL= option determines the time periods for the accumulation of observations. The INTERVAL= option is required and must be the first option specified in the ID statement. NOTSORTED
specifies that the time ID values not be in sorted order. The TIMESERIES procedure sorts the data with respect to the time ID prior to analysis. SETMISSING= option | number
specifies how missing values (either actual or accumulated) are to be interpreted in the accumulated time series. If a number is specified, missing values are set to the number. If a missing value indicates an unknown value, this option should not be used. If a missing value indicates no value, SETMISSING=0 should be used. You would typically use SETMISSING=0 for transactional data because no recorded data usually implies no activity. The following options can also be used to determine how missing values are assigned:
1868 F Chapter 29: The TIMESERIES Procedure
MISSING
Missing values are set to missing. This is the default option.
AVERAGE | AVG
Missing values are set to the accumulated average value.
MINIMUM | MIN
Missing values are set to the accumulated minimum value.
MEDIAN | MED
Missing values are set to the accumulated median value.
MAXIMUM | MAX
Missing values are set to the accumulated maximum value.
FIRST
Missing values are set to the accumulated first nonmissing value.
LAST
Missing values are set to the accumulated last nonmissing value.
PREVIOUS | PREV
Missing values are set to the previous period’s accumulated nonmissing value. Missing values at the beginning of the accumulated series remain missing.
NEXT
Missing values are set to the next period’s accumulated nonmissing value. Missing values at the end of the accumulated series remain missing.
START= option
specifies a SAS date or datetime value that represents the beginning of the data. If the first time ID variable value is greater than the START= value, the series is prepended with missing values. If the first time ID variable value is less than the START= value, the series is truncated. The START= and END= options can be used to ensure that data associated with each by group contains the same number of observations.
SEASON Statement SEASON statistics < / options > ;
A SEASON statement can be used with the TIMESERIES procedure to specify options that are related to seasonal analysis of the time-stamped transactional data. Only one SEASON statement is allowed. The options specified affect all variables specified in the VAR statements. Seasonal analysis can be performed only when the length of the seasonal cycle specified by the PROC TIMESERIES statement SEASONALITY= option or implied by the ID statement INTERVAL= option is greater than one. The following seasonal statistics are available: NOBS
number of observations
N
number of nonmissing observations
NMISS
number of missing observations
MINIMUM
minimum value
MAXIMUM
maximum value
RANGE
range value
SUM
summation value
SPECTRA Statement F 1869
MEAN
mean value
STDDEV
standard deviation
CSS
corrected sum of squares
USS
uncorrected sum of squares
MEDIAN
median value
If none of the season statistics are specified, the default is as follows: season n min max mean std;
The following option can be specified in the SEASON statement following the slash (/): TRANSPOSE= NO | YES
specifies which values are recorded as column names in the OUTSEASON= data set. TRANSPOSE=YES specifies that the seasonal indices be recorded as the column names instead of the statistics. The TRANSPOSE=NO option is useful for graphing the seasonal analysis results with SAS/GRAPH procedures. The TRANSPOSE=YES option is useful for analyzing the seasonal analysis results with SAS procedures or SAS Enterprise Miner software. The default is TRANSPOSE=NO.
SPECTRA Statement SPECTRA statistics < / options > ;
A SPECTRA statement can be used with the TIMESERIES procedure to specify which statistics appear in the OUTSPECTRA= data set. The SPECTRA statement options are used in performing a spectral analysis on the variables listed in the VAR statement. These options affect values that are produced in the PROC TIMESERIES statement’s OUTSPECTRA= data set, and in the periodogram and spectral density estimate. Only one SPECTRA statement is allowed. The following univariate frequency domain statistics are available: FREQ
frequency in radians from 0 to
PERIOD
period or wavelength
COS
cosine transform
SIN
sine transform
P
periodogram
S
spectral density estimates
If none of the frequency domain statistics are specified, the default is as follows: spectra period p;
1870 F Chapter 29: The TIMESERIES Procedure
The following options can be specified in the SPECTRA statement following the slash (/): ADJMEAN | CENTER
subtracts the series mean before performing the Fourier decomposition. This sets the first periodogram ordinate to 0 rather than to 2n times the squared mean. This option is commonly used when the periodograms are to be plotted to prevent a large first periodogram ordinate from distorting the scale of the plot. ALPHA= num
specifies the width of a window drawn around the spectral density estimate in a spectral density versus frequency plot. Based on approximations proposed by Brockwell and Davis (1991), periodogram ordinates fall within this window with a confidence level of 1 ALPHA. The value ALPHA must be between 0 and 1; the default is 0.5. kernel DOMAIN=domain C=c EXP|EXPON=e
specifies the smoothing function used to calculate a spectral density estimate as the moving average of periodogram ordinates. The kernel function is an alternative way to using the WEIGHTS option as a smoothing function. The available kernel values are: PARZEN
Parzen kernel
BART | BARTLETT
Bartlett kernel
TUK | TUKEY
Tukey-Hanning kernel
TRUNC | TRUNCAT
truncated kernel
QS | QUADR
quadratic spectral kernel
The DOMAIN= option specifies how the smoothing function is interpreted. The available domain values are: FREQUENCY
smooths the periodogram ordinates.
TIME
applies the kernel as a filter to the time series. autocovariance function
By default DOMAIN=FREQUENCY, and smoothing is applied in the same manner as weights are applied when the WEIGHTS= option is used. Each of the kernel functions can be further parameterized by a bandwidth value by using the C= and EXPON= options. A summary of the default values of the bandwidth parameters, c and e, that are associated with the kernel functions and the bandwidth values, M , for a series with 100 periodogram ordinates is listed in Table 29.2. Table 29.2
Default Bandwidth Parameters
Kernel Bartlett Parzen Quadratic Tukey-Hanning Truncated
c 1=2 1 1=2 2=3 1=4
e 1=3 1=5 1=5 1=5 1=5
M 2.32 2.51 1.26 1.67 0.63
SSA Statement F 1871
For example, to apply the truncated kernel by using default bandwidth parameters in the frequency domain, the following SPECTRA statement could be used: spectra / truncat;
Details of the kernel function bandwidth parameterization and the DOMAIN= option are provided in the section “Using Kernel Specifications” on page 1886. WEIGHTS numlist
specifies the relative weights used in computing a spectral density estimate as the moving average smoothing of periodogram ordinates. If neither a WEIGHTS option nor a kernel function is specified, the spectral density estimate is identical to the unmodified periodogram. The following SPECTRA statement uses the WEIGHTS option to specify equal weighting for each of the three adjacent periodogram ordinates centered on each spectral density estimate: spectra / weights 1 1 1;
Further description of how the weights are applied is provided in the section “Using Specification of Weight Constants” on page 1886.
SSA Statement SSA < / options > ;
An SSA statement can be used with the TIMESERIES procedure to specify options that are related to singular spectrum analysis (SSA) of the accumulated time series. Only one SSA statement is allowed. The following options can be specified in the SSA statement following the slash (/). GROUPS= (numlist): : :(numlist)
specifies the lists that categorize window lags into groups. The window lags must be separated by spaces or commas. For example, GROUPS=(1,3) (2,4) specifies that the first and third window lags form the first group and the second and fourth window lags form the second group. If no GROUPS= option is specified, the window lags are divided into two groups based on the THRESHOLDPCT= value. For example, the following SSA statement specifies three groups: ssa / groups=(1 3)(2 4 5)(6);
The first group contains the first and third principal components; the second group contains the second, fourth, and fifth principal components; and the third group contains the sixth principal component.
1872 F Chapter 29: The TIMESERIES Procedure
By default, the first group contains the principal components whose contribution to variability in the series sums to greater than the THRESHOLDPCT= value of 90%, and the second group contains the remaining components. LENGTH = number
specifies the window length to be used in the analysis. It represents the maximum lag used in the SSA autocovariance calculations. The number specified by the LENGTH= option must be greater than one and less than 1000. When the SEASONALITY= option is provided or inferred by the INTERVAL= option in the ID statement the default window length is the smaller of two times the length of the seasonal cycle and one half the length of the time series. When no seasonality value is available the default window length is the smaller of 12 and one half the length of the time series. For example, the following SSA statement specifies a window length of 10: ssa / length=10;
If no window length option is specified and the INTERVAL=MONTH or SEASONALITY=12 options are specified, a window length of 24 is used. If the specified window length is greater than one-half the length of the accumulated time series, the window length is reduced and a warning message is printed to the log. NPERIODS= number
specifies the number of time periods to be stored in the OUTSSA= data set when the TRANSPOSE=YES option is specified. If the TRANSPOSE option is not specified, the NPERIODS= option is ignored. The NPERIODS= option specifies the number of OUTSSA= data set variables to contain the groups. If the NPERIODS= option is not specified, all of the periods specified between the ID statement START= and END= options are stored. If at least one of the START= or END= options is not specified, the default magnitude is the seasonality specified by the SEASONALITY= option in the PROC TIMESERIES statement or implied by the INTERVAL= option in the ID statement. If only the START= option or both the START= and END= options are specified and the seasonality is zero, the default is NPERIODS=5. If only the END= option or neither the START= nor END= option is specified and the seasonality is zero, the default is NPERIODS=– 5. THRESHOLDPCT= percent
specifies a percentage used to divide the SSA components into two groups based on the cumulative percentage of their singular values. The percentage specified by the THRESHOLDPCT= option must be greater than zero and less than 100. The default is THRESHOLDPCT=90. For example, the following SSA statement specifies 80%: ssa / THRESHOLDPCT=80;
The size of the second group must be at least one, and it must be less than the window length. The percentage is adjusted to achieve this requirement.
TREND Statement F 1873
For example, the following SSA statement specifies a THRESHOLDPCT= of 0%, which effectively sets the size of the second group to one less than the window length: ssa / THRESHOLDPCT = 0;
The following SSA statement specifies a THRESHOLDPCT= of 100%, which implies that the size of the last group is one: ssa / THRESHOLDPCT= 100;
TRANSPOSE= NO | YES
specifies which values are recorded as column names in the OUTSSA= data set. TRANSPOSE=YES specifies that the time periods be recorded as the column names instead of the specified groups as the column names. The first and last time period stored in the OUTSSA= data set corresponds to the period of the ID statement START= and END= options, respectively. If only the ID statement END= option is specified, the last time ID value of each accumulated time series corresponds to the last time period column. If only the ID statement START= option is specified, the first time ID value of each accumulated time series corresponds to the first time period column. If neither the START= option nor the END= option is specified with the ID statement, the first time ID value of each accumulated time series corresponds to the first time period column. The TRANSPOSE=NO option is useful for displaying the SSA results. The TRANSPOSE=YES option is useful for analyzing the SSA results using SAS Enterprise Miner software. The default is TRANSPOSE=NO.
TREND Statement TREND statistics < / options > ;
A TREND statement can be used with the TIMESERIES procedure to specify options related to trend analysis of the time-stamped transactional data. Only one TREND statement is allowed. The options specified affect all variables specified in the VAR statements. The following trend statistics are available: NOBS
number of observations
N
number of nonmissing observations
NMISS
number of missing observations
MINIMUM
minimum value
MAXIMUM
maximum value
RANGE
range value
SUM
summation value
MEAN
mean value
1874 F Chapter 29: The TIMESERIES Procedure
STDDEV
standard deviation
CSS
corrected sum of squares
USS
uncorrected sum of squares
MEDIAN
median value
If none of the trend statistics are specified, the default is as follows: trend n min max mean std;
The following options can be specified in the TREND statement following the slash (/): NPERIODS= number
specifies the number of time periods to be stored in the OUTTREND= data set when the TRANSPOSE=YES option is specified. If the TRANSPOSE option is not specified, the NPERIODS= option is ignored. The NPERIODS= option specifies the number of OUTTREND= data set variables to contain the trend statistics and is therefore limited to the maximum allowed number of SAS variables. If the NPERIODS= option is not specified, all of the periods specified between the ID statement START= and END= options are stored. If at least one of the START= or END= options is not specified, the default magnitude is the seasonality specified by the SEASONALITY= option in the PROC TIMESERIES statement or implied by the INTERVAL= option in the ID statement. If only the START= option or both the START= and END= options are specified and the seasonality is zero, the default is NPERIODS=5. If only the END= option or neither the START= nor END= option is specified and the seasonality is zero, the default is NPERIODS=– 5. TRANSPOSE= NO | YES
specifies which values are recorded as column names in the OUTTREND= data set. TRANSPOSE=YES specifies that the time periods be recorded as the column names instead of the statistics as the column names. The first and last time periods stored in the OUTTREND= data set correspond to the period of the ID statement START= and END= options, respectively. If only the ID statement END= option is specified, the last time ID value of each accumulated time series corresponds to the last time period column. If only the ID statement START= option is specified, the first time ID value of each accumulated time series corresponds to the first time period column. If neither the START= option nor the END= option is specified with the ID statement, the first time ID value of each accumulated time series corresponds to the first time period column. The TRANSPOSE=NO option is useful for analyzing or displaying the trend analysis results with SAS/GRAPH procedures. The TRANSPOSE=YES option is useful for analyzing the trend analysis results with other SAS procedures or SAS Enterprise Miner software. The default is TRANSPOSE=NO.
VAR and CROSSVAR Statements VAR variable-list < / options > ;
VAR and CROSSVAR Statements F 1875
CROSSVAR variable-list < / options > ;
The VAR and CROSSVAR statements list the numeric variables in the DATA= data set whose values are to be accumulated to form the time series. An input data set variable can be specified in only one VAR or CROSSVAR statement. Any number of VAR and CROSSVAR statements can be used. The following options can be used with the VAR and CROSSVAR statements: ACCUMULATE= option
specifies how the data set observations are to be accumulated within each time period for the variables listed in the VAR or CROSSVAR statement. If the ACCUMULATE= option is not specified in the VAR or CROSSVAR statement, accumulation is determined by the ACCUMULATE= option of the ID statement. See the ID statement ACCUMULATE= option for more details. DIF=( numlist )
specifies the differencing to be applied to the accumulated time series. The list of differencing orders must be separated by spaces or commas. For example, DIF=(1,3) specifies first then third order differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the DIF= option. SDIF=( numlist )
specifies the seasonal differencing to be applied to the accumulated time series. The list of seasonal differencing orders must be separated by spaces or commas. For example, SDIF=(1,3) specifies first then third order seasonal differencing. Differencing is applied after time series transformation. The TRANSFORM= option is applied before the SDIF= option. SETMISS= option | number SETMISSING= option | number
specifies how missing values (either actual or accumulated) are to be interpreted in the accumulated time series for variables listed in the VAR or CROSSVAR statement. If the SETMISSING= option is not specified in the VAR or CROSSVAR statement, missing values are set based on the SETMISSING= option of the ID statement. See the ID statement SETMISSING= option for more details. TRANSFORM= option
specifies the time series transformation to be applied to the accumulated time series. The following transformations are provided: NONE
No transformation is applied. This option is the default.
LOG
logarithmic transformation
SQRT
square-root transformation
LOGISTIC
logistic transformation
BOXCOX(n )
Box-Cox transformation with parameter number where the number is between –5 and 5
When the TRANSFORM= option is specified, the time series must be strictly positive.
1876 F Chapter 29: The TIMESERIES Procedure
Details: TIMESERIES Procedure The TIMESERIES procedure can be used to perform trend and seasonal analysis on transactional data. For trend analysis, various sample statistics are computed for each time period defined by the time ID variable and INTERVAL= option. For seasonal analysis, various sample statistics are computed for each season defined by the INTERVAL= or the SEASONALITY= option. For example, if the transactional data ranges from June 1990 to January 2000 and the data are to be accumulated on a monthly basis, then the trend statistics are computed for every month: June 1990, July 1990, . . . , January 2000. The seasonal statistics are computed for each season: January, February, . . . , December. The TIMESERIES procedure can be used to form time series data from transactional data. The accumulated time series can then be analyzed using time series techniques. The data is analyzed in the following order: 1. accumulation
ACCUMULATE= option in the ID, VAR, or CROSSVAR statement
2. missing value interpretation
SETMISSING= option in the ID, VAR, or CROSSVAR statement
3. time series transformation
TRANSFORM= option in the VAR or CROSSVAR statement
4. time series differencing
DIF= and SDIF= options in the VAR or CROSSVAR statement
5. descriptive statistics
OUTSUM= option and the PRINT=DESCSTATS option
6. seasonal decomposition
DECOMP statement or the OUTDECOMP= option in the PROC TIMESERIES statement
7. correlation analysis
CORR statement or the OUTCORR= option in the PROC TIMESERIES statement
8. singular spectrum analysis
SSA statement or the OUTSSA= option in the PROC TIMESERIES statement
9. Fourier spectral analysis
SPECTRA statement or the OUTSPECTRA= option in the PROC TIMESERIES statement
10. cross-correlation analysis
CROSSCORR statement or the OUTCROSSCORR= option in the PROC TIMESERIES statement
Accumulation If the ACCUMULATE= option in the ID, VAR, or CROSSVAR statement is specified, data set observations are accumulated within each time period. The frequency (width of each time interval)
Accumulation F 1877
is specified by the ID statement INTERVAL= option. The ID variable contains the time ID values. Each time ID value corresponds to a specific time period. Accumulation is useful when the input data set contains transactional data, whose observations are not spaced with respect to any particular time interval. The accumulated values form the time series, which is used in subsequent analyses. For example, suppose a data set contains the following observations: 19MAR1999 19MAR1999 11MAY1999 12MAY1999 23MAY1999
10 30 50 20 20
If the INTERVAL=MONTH is specified, all of the above observations fall within a three-month period of time between March 1999 and May 1999. The observations are accumulated within each time period as follows: If the ACCUMULATE=NONE option is specified, an error is generated because the ID variable values are not equally spaced with respect to the specified frequency (MONTH). If the ACCUMULATE=TOTAL option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
40 . 90
If the ACCUMULATE=AVERAGE option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
20 . 30
If the ACCUMULATE=MINIMUM option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
10 . 20
If the ACCUMULATE=MEDIAN option is specified, the resulting time series is: O1MAR1999 01APR1999 O1MAY1999
20 . 20
If the ACCUMULATE=MAXIMUM option is specified, the resulting time series is:
1878 F Chapter 29: The TIMESERIES Procedure
O1MAR1999 O1APR1999 O1MAY1999
30 . 50
If the ACCUMULATE=FIRST option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
10 . 50
If the ACCUMULATE=LAST option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
30 . 20
If the ACCUMULATE=STDDEV option is specified, the resulting time series is: O1MAR1999 O1APR1999 O1MAY1999
14.14 . 17.32
As can be seen from the above examples, even though the data set observations contain no missing values, the accumulated time series can have missing values.
Boundary Alignment When the BOUNDARYALIGN= option is used to qualify the START= or END= options, additional time series values can be incorporated into the accumulation operation. For instance, if a data set contains the following observations 01JAN1999 01FEB1999 01MAR1999 01APR1999 01MAY1999 01JUN1999
10 10 10 10 10 10
and the options START=0 01FEB19990 d, END=0 01APR19990 d, INTERVAL=QUARTER, and ACCUMULATE=TOTAL are specified, using the BOUNDARYALIGN= option results in the following accumlated time series: If BOUNDARYALIGN=START is specified, the accumulated time series is 01JAN1999 01APR1999
30 10
Missing Value Interpretation F 1879
If BOUNDARYALIGN=END is specified, the accumulated time series is 01JAN1999 01APR1999
20 30
If BOUNDARYALIGN=BOTH is specified, the accumulated time series is 01JAN1999 01APR1999
30 30
If BOUNDARYALIGN=NONE is specified, the accumulated time series is 01JAN1999 01APR1999
20 10
Missing Value Interpretation Sometimes missing values should be interpreted as unknown values. But sometimes missing values are known, such as when missing values are created from accumulation and no observations should be interpreted as no value—that is, zero. In the former case, the SETMISSING= option can be used to interpret how missing values are treated. The SETMISSING=0 option should be used when missing observations are to be treated as no (zero) values. In other cases, missing values should be interpreted as global values, such as minimum or maximum values of the accumulated series. The accumulated and interpreted time series is used in subsequent analyses.
Time Series Transformation There are four transformations available for strictly positive series only. Let yt > 0 be the original time series, and let wt be the transformed series. The transformations are defined as follows: Log
is the logarithmic transformation. wt D ln.yt /
Logistic
is the logistic transformation. wt D ln.cyt =.1
cyt //
where the scaling factor c is c D .1
10
6
/10
ceil.log10 .max.yt ///
and ceil.x/ is the smallest integer greater than or equal to x.
1880 F Chapter 29: The TIMESERIES Procedure
Square root
is the square root transformation. wt D
Box Cox
p
yt
is the Box-Cox transformation. ( yt 1 ; ¤0 wt D ln.yt /; D 0
More complex time series transformations can be performed by using the EXPAND procedure of SAS/ETS.
Time Series Differencing After optionally transforming the series, the accumulated series can be simply or seasonally differenced by using the VAR and CROSSVAR statement DIF= and SDIF= options. For example, suppose yt is a monthly time series. The following examples of the DIF= and SDIF= options demonstrate how to simply and seasonally difference the time series. dif=(1) sdif=(1) dif=(1,12)
Additionally, when yt is strictly positive and the TRANSFORM=, DIF=, and SDIF= options are combined in the VAR and CROSSVAR statements, the transformation operation is performed before the differencing operations.
Descriptive Statistics Descriptive statistics can be computed from the working series by specifying the OUTSUM= option or PRINT=DESCSTATS.
Seasonal Decomposition F 1881
Seasonal Decomposition Seasonal decomposition/analysis can be performed on the working series by specifying the OUTDECOMP= option, the PRINT=DECOMP option, or one of the PLOTS= options associated with decomposition in the PROC TIMESERIES statement. The DECOMP statement enables you to specify options related to decomposition. The TIMESERIES procedure uses classical decomposition. More complex seasonal decomposition/adjustment analysis can be performed by using the X11 or the X12 procedure of SAS/ETS. The DECOMP statement MODE= option determines the mode of the seasonal adjustment decomposition to be performed. There are four modes: multiplicative (MODE=MULT), additive (MODE=ADD), pseudo-additive (MODE=PSEUDOADD), and log-additive (MODE=LOGADD) decomposition. The default is MODE=MULTORADD which specifies MODE=MULT for series that are strictly positive, MODE=PSEUDOADD for series that are nonnegative, and MODE=ADD for series that are not nonnegative. When MODE=LOGADD is specified, the components are exponentiated to the original metric. The DECOMP statement LAMBDA= option specifies the Hodrick-Prescott filter parameter (Hodrick and Prescott 1980). The default is LAMBDA=1600. The Hodrick-Prescott filter is used to decompose the trend-cycle component into the trend component and cycle component in an additive fashion. A smaller parameter assigns less significance to the cycle; that is, LAMBDA=0 implies no cycle component. The notation and keywords associated with seasonal decomposition/adjustment analysis are defined in Table 29.3.
1882 F Chapter 29: The TIMESERIES Procedure
Table 29.3
Seasonal Adjustment Formulas
Component
Keyword
MODE= Option
Formula
original series
ORIGINAL
trend-cycle component
TCC
seasonal-irregular component
SIC
seasonal component
SC
irregular component
IC
trend-cycle-seasonal component
TCS
trend component
TC
cycle component
CC
seasonally adjusted series
SA
MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD MULT ADD LOGADD PSEUDOADD
Ot D T Ct St It Ot D T C t C St C I t log.Ot / D T Ct C St C It Ot D T Ct .St C It 1/ centered moving average of Ot centered moving average of Ot centered moving average of log.Ot / centered moving average of Ot SIt D St It D Ot =T Ct SIt D St C It D Ot T Ct SIt D St C It D log.Ot / T Ct SIt D St C It 1 D Ot =T Ct seasonal Averages of SIt seasonal Averages of SIt seasonal Averages of SIt seasonal Averages of SIt It D SIt =St It D SIt St It D SIt St It D SIt St C 1 T CSt D T Ct St D Ot =It T CSt D T Ct C St D Ot It T CSt D T Ct C St D Ot It T CSt D T Ct St T t D T C t Ct T t D T C t Ct T t D T C t Ct T t D T C t Ct Ct D T Ct Tt Ct D T Ct Tt Ct D T Ct Tt Ct D T Ct Tt SAt D Ot =St D T Ct It SAt D Ot St D T Ct C It SAt D Ot =exp.St / D exp.T Ct C It / SAt D T Ct It
The trend-cycle component is computed from the s-period centered moving average as follows: T Ct D
bs=2c X kD bs=2c
yt Ck =s
Correlation Analysis F 1883
The seasonal component is obtained by averaging the seasonal-irregular component for each season. SkCjs D
X t Dk mod s
SIt T =s
where 0j T =s and 1ks. The seasonal components are normalized to sum to one (multiplicative) or zero (additive).
Correlation Analysis Correlation analysis can be performed on the working series by specifying the OUTCORR= option or one of the PLOTS= options that are associated with correlation. The CORR statement enables you to specify options that are related to correlation analysis.
Autocovariance Statistics LAGS
h 2 f0; : : : ; H g
N
Nh is the number of observed products at lag h, ignoring missing values P
O .h/ D T1 TtDhC1 .yt y/.yt h y/ P
O .h/ D N1h TtDhC1 .yt y/.yt h y/ when embedded missing values are present
ACOV ACOV
Autocorrelation Statistics
ACFSTD
.h/ O D O .h/= O .0/ r Ph 1 1 2 S t d..h// O D T 1 C 2 j D1 .j O /
ACFNORM
Norm..h// O D .h/=Std. O .h// O
ACFPROB
P rob..h// O D 2 .1
ACFLPROB
LogP rob..h// O D log10 .P rob..h// O 8 .h/ O > 2Std..h// O < 1 0 2Std..h// O < .h/ O < 2Std..h// O F lag..h// O D : 1 .h/ O < 2Std..h// O
ACF
ACF2STD
ˆ .jNorm..h//j// O
Partial Autocorrelation Statistics h 1/ f j gj D1
PACF
'.h/ O D .0;h
PACFSTD
p S t d.'.h// O D 1= N0
PCFNORM
Norm.'.h// O D '.h/=Std. O '.h// O
PACFPROB
P rob.'.h// O D 2 .1
ˆ .jNorm.'.h//j// O
1884 F Chapter 29: The TIMESERIES Procedure
PACFLPROB PACF2STD
LogP rob.'.h// O D log10 .P rob.'.h// O 8 '.h/ O > 2Std.'.h// O < 1 0 2Std.'.h// O < '.h/ O < 2Std.'.h// O F lag.'.h// O D : 1 '.h/ O < 2Std.'.h// O
Inverse Autocorrelation Statistics IACF IACFSTD IACFNORM IACFPROB IACFLPROB IACF2STD
O .h/
p S t d.O .h// D 1= N0 O O Norm.O .h// D .h/=Std. .h// O P rob.O .h// D 2 1 ˆ jNorm..h//j O LogP rob.O .h// D log10 .P rob..h// 8 O O ˆ .h/ > 2Std..h// < 1 O O O F lag.O .h// D 0 2Std..h// < .h/ < 2Std..h// ˆ : 1 .h/ O O < 2Std..h//
White Noise Statistics
WN
P Q.h/ D T .T C 2/ hjD1 .j /2 =.T j / P Q.h/ D hjD1 Nj .j /2 when embedded missing values are present
WNPROB
P rob.Q.h// D max.1;h
WNLPROB
LogP rob.Q.h// D
WN
p/ .Q.h//
log10 .P rob.Q.h//
Cross-Correlation Analysis Cross-correlation analysis can be performed on the working series by specifying the OUTCROSSCORR= option or one of the CROSSPLOTS= options that are associated with cross-correlation. The CROSSCORR statement enables you to specify options that are related to cross-correlation analysis.
Cross-Correlation Statistics The cross-correlation statistics for the variable x supplied in a VAR statement and variable y supplied in a CROSSVAR statement are: LAGS
h 2 f0; : : : ; H g
N
Nh is the number of observed products at lag h, ignoring missing values P
Ox;y .h/ D T1 TtDhC1 .xt x/.yt h y/
CCOV
Spectral Density Analysis F 1885
1 Nh
PT
CCOV
Ox;y .h/ D present
CCF CCFSTD
p Ox;y .h/ D Ox;y .h/= Ox .0/ Oy .0/ p S t d.Ox;y .h// D 1= N0
CCFNORM
Norm.Ox;y .h// D Ox;y .h/=Std.Ox;y .h//
CCFPROB
P rob.Ox;y .h// D 2 1
CCFLPROB
LogP rob.Ox;y .h// D log10 .P rob.Ox;y .h// 8 Ox;y .h/ > 2Std.Ox;y .h// < 1 0 2Std.Ox;y .h// < Ox;y .h/ < 2Std.Ox;y .h// F lag.Ox;y .h// D : 1 Ox;y .h/ < 2Std.Ox;y .h//
CCF2STD
t DhC1 .xt
x/.yt
h
y/ when embedded missing values are
ˆ jNorm.Ox;y .h//j
Spectral Density Analysis Spectral analysis can be performed on the working series by specifying the OUTSPECTRA= option or by specifying the PLOTS=PERIODOGRAM or PLOTS=SPECTRUM option in the PROC TIMESERIES statement. PROC TIMESERIES uses the finite Fourier transform to decompose data series into a sum of sine and cosine terms of different amplitudes and wavelengths. The Fourier transform decomposition of the series xt is m X a0 xt D C Œak cos.!k t / C bk sin.!k t/ 2 kD1
where t
is the time subscript, t D 1; 2; : : : ; n
xt
are the equally spaced time series data
n
is the number of observations in the time series
m
is the number of frequencies in the Fourier decomposition: m D m D n 2 1 if n is odd
a0
is the mean term: a0 D 2x
ak
are the cosine coefficients
bk
are the sine coefficients
!k
are the Fourier frequencies: !k D
n 2
if n is even,
2k n
Functions of the Fourier coefficients ak and bk can be plotted against frequency or against wave length to form periodograms. The amplitude periodogram Jk is defined as follows: Jk D
n 2 .a C bk2 / 2 k
The Fourier decomposition is performed after the ACCUMULATE=, DIF=, SDIF= and TRANSFORM= options in the ID and VAR statements have been applied.
1886 F Chapter 29: The TIMESERIES Procedure
Computational Method If the number of observations, n, factors into prime integers that are less than or equal to 23, and the product of the square-free factors of n is less than 210, then the procedure uses the fast Fourier transform developed by Cooley and Tukey (1965) and implemented by Singleton (1969). If n cannot be factored in this way, then the procedure uses a Chirp-Z algorithm similar to that proposed by Monro and Branch (1976).
Missing Values Missing values are replaced with an estimate of the mean to perform spectral analyses. This treatment of a series with missing values is consistent with the approach used by Priestley (1981).
Using Specification of Weight Constants Any number of weighting constants can be specified. The constants are interpreted symmetrically about the middle weight. The middle constant (or the constant to the right of the middle if an even number of weight constants is specified) is the relative weight of the current periodogram ordinate. The constant immediately following the middle one is the relative weight of the next periodogram ordinate, and so on. The actual weights used in the smoothing process are the weights specified in the WEIGHTS option, scaled so that they sum to 1. The moving average calculation reflects at each end of the periodogram to accommodate the periodicity of the periodogram function. For example, a simple triangular weighting can be specified using the following WEIGHTS option: spectra / weights 1 2 3 2 1;
Using Kernel Specifications You can specify one of ten different kernels in the SPECTRA statement. The two parameters c 0 and e 0 are used to compute the bandwidth parameter M D cq e where q is the number of periodogram ordinates + 1, q D floor.n=2/ C 1 To specify the bandwidth explicitly, set c D to the desired bandwidth and e D 0. For example, a Parzen kernel with a support of 11 periodogram ordinates can be specified using the following kernel option: spectra / parzen c=5 expon=0;
Spectral Density Analysis F 1887
Kernels are used to smooth the periodogram by using a weighted moving average of nearby points. A smoothed periodogram is defined by the equation JOi .M / D
q X
w
D q
JQi C M
where w.x/ is the kernel or weight function. At the endpoints, the moving average is computed cyclically; that is,
JQi C
8 ˆ <Ji C D J .i C/ ˆ : J2q .i C/
0i C q i C q
where Ji is the i th periodogram ordinate. The TIMESERIES procedure supports the following kernels: BART: Bartlett kernel ( w.x/ D
1 0
jxj
jxj1 otherwise
PARZEN: Parzen kernel 8 2 3 ˆ =M
TUKEY: Tukey-Hanning equivalent lag window filter w. / D DM . / D
1 1 1 DM . =M / C DM ./ C DM . C =M / 4 2 4 1 sinŒ.M C 1=2/ 2 sin.=2/
TRUNC: truncated equivalent lag window filter w. / D DM . /
Singular Spectrum Analysis Given a time series, yt , for t D 1; : : : ; T , and a window length, 2 L < T =2, singular spectrum analysis Golyandina, Nekrutkin, and Zhigljavsky (2001) decompose the time series into spectral groupings using the following steps:
Singular Spectrum Analysis F 1889
Embedding Step
Using the time series, form a K L trajectory matrix, X, with elements X D fxk;l gK;L kD1;lD1 such that xk;l D yk lC1 for k D 1; : : : ; Kand l D 1; : : : ; L and where K D T definition L K < T , because 2 L < T =2.
L C 1. By
Decomposition Step
Using the trajectory matrix, X, apply singular value decomposition to the trajectory matrix X D UQV where U represents the K L matrix that contains the left-hand-side (LHS) eigenvectors, where Q represents the diagonal L L matrix that contains the singular values, and where V represents the L L matrix that conatins the right-hand-side (RHS) eigenvectors. Therefore, XD
L X
.l/
X
D
lD1
L X
ul ql vTl
lD1
where X.l/ represents the K L principal component matrix, ul represents the K 1 left-hand-side (LHS) eigenvector, ql represents the singular value, and vl represents the L 1 right-hand-side (RHS) eigenvector associated with the lth window index. Grouping Step
For each group index, m D 1; : : : ; M , define a group of window indices Im f1; : : : ; Lg. Let X X XIm D X.l/ D ul ql vTl l2Im
l2Im
represent the grouped trajectory matrix for group Im . If groupings represent a spectral partition, M [
Im D f1; : : : ; Lg
and
Im \ In D ; for
m¤n
mD1
then according to the singular value decomposition theory, XD
M X
XIm
mD1
Averaging Step
For each group index, m D 1; : : : ; M , compute the diagonal average of XIm , .m/ xQ t
et 1 X .m/ D xt lC1;l nt lDst
1890 F Chapter 29: The TIMESERIES Procedure
where st D 1; st D 1; st D T
et D t; nt D t et D L; nt D L 1; et D L; nt D T
t
t C1
for for for
T
1 t L t LC1< t
m
1
26.46
|t|
Intercept output
1 1
-2.99992 0.746596
0.6478 0.0762
-4.63 9.80
0.0001 ;
Functional Summary The statements and options used with the TSCSREG procedure are summarized in the following table. Table 30.1
Functional Summary
Description
Statement
Option
Data Set Options specify the input data set write parameter estimates to an output data set include correlations in the OUTEST= data set include covariances in the OUTEST= data set specify number of time series observations specify number of cross sections
TSCSREG TSCSREG TSCSREG TSCSREG TSCSREG TSCSREG
DATA= OUTEST= CORROUT COVOUT TS= CS=
Declaring the Role of Variables specify BY-group processing specify the cross section and time ID variables
BY ID
Printing Control Options print correlations of the estimates print covariances of the estimates suppress printed output perform tests of linear hypotheses
MODEL MODEL MODEL TEST
CORRB COVB NOPRINT
Model Estimation Options specify the one-way fixed-effects model specify the two-way fixed-effects model specify the one-way random-effects model specify the two-way random-effects model specify Fuller-Battese method specify PARKS
MODEL MODEL MODEL MODEL MODEL MODEL
FIXONE FIXTWO RANONE RANTWO FULLER PARKS
1926 F Chapter 30: The TSCSREG Procedure
Description
Statement
Option
specify Da Silva method specify order of the moving-average error process for Da Silva method print ˆ matrix for Parks method print autocorrelation coefficients for Parks method suppress the intercept term control check for singularity
MODEL MODEL
DASILVA M=
MODEL MODEL
PHI RHO
MODEL MODEL
NOINT SINGULAR=
PROC TSCSREG Statement PROC TSCSREG options ;
The following options can be specified in the PROC TSCSREG statement. DATA=SAS-data-set
names the input data set. The input data set must be sorted by cross section and by time period within cross section. If you omit the DATA= option, the most recently created SAS data set is used. TS=number
specifies the number of observations in the time series for each cross section. The TS= option value must be greater than 1. The TS= option is required unless an ID statement is used. Note that the number of observations for each time series must be the same for each cross section and must cover the same time period. CS=number
specifies the number of cross sections. The CS= option value must be greater than 1. The CS= option is required unless an ID statement is used. OUTEST=SAS-data-set
the parameter estimates. When the OUTEST= option is not specified, the OUTEST= data set is not created. OUTCOV COVOUT
writes the covariance matrix of the parameter estimates to the OUTEST= data set. OUTCORR CORROUT
writes the correlation matrix of the parameter estimates to the OUTEST= data set. In addition, any of the following MODEL statement options can be specified in the PROC TSCSREG statement: CORRB, COVB, FIXONE, FIXTWO, RANONE, RANTWO, FULLER,
BY Statement F 1927
PARKS, DASILVA, NOINT, NOPRINT, M=, PHI, RHO, and SINGULAR=. When specified in the PROC TSCSREG statement, these options are equivalent to specifying the options for every MODEL statement.
BY Statement BY variables ;
A BY statement can be used with PROC TSCSREG to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the input data set must be sorted by the BY variables as well as by cross section and time period within the BY groups. When both an ID statement and a BY statement are specified, the input data set must be sorted first with respect to BY variables and then with respect to the cross section and time series ID variables. For example, proc sort data=a; by byvar1 byvar2 csid tsid; run; proc tscsreg data=a; by byvar1 byvar2; id csid tsid; ... run;
When both a BY statement and an ID statement are used, the data set might have a different number of cross sections or a different number of time periods in each BY group. If no ID statement is used, the CS=N and TS=T options must be specified and each BY group must contain N T observations.
ID Statement ID cross-section-id-variable time-series-id-variable ;
The ID statement is used to specify variables in the input data set that identify the cross section and time period for each observation. When an ID statement is used, the TSCSREG procedure verifies that the input data set is sorted by the cross section ID variable and by the time series ID variable within each cross section. The TSCSREG procedure also verifies that the time series ID values are the same for all cross sections. To make sure the input data set is correctly sorted, use PROC SORT with a BY statement with the variables listed exactly as they are listed in the ID statement to sort the input data set. For example, proc sort data=a;
1928 F Chapter 30: The TSCSREG Procedure
by csid tsid; run; proc tscsreg data=a; id csid tsid; ... etc. ... run;
If the ID statement is not used, the TS= and CS= options must be specified on the PROC TSCSREG statement. Note that the input data must be sorted by time within cross section, regardless of whether the cross section structure is given by an ID statement or by the options TS= and CS=. If an ID statement is specified, the time series length T is set to the minimum number of observations for any cross section, and only the first T observations in each cross section are used. If both the ID statement and the TS= and CS= options are specified, the TS= and CS= options are ignored.
MODEL Statement MODEL response = regressors / options ;
The MODEL statement specifies the regression model and the error structure assumed for the regression residuals. The response variable on the left side of the equal sign is regressed on the independent variables listed after the equal sign. Any number of MODEL statements can be used. For each model statement, only one response variable can be specified on the left side of the equal sign. The error structure is specified by the FIXONE, FIXTWO, RANONE, RANTWO, FULLER, PARKS, and DASILVA options. More than one of these options can be used, in which case the analysis is repeated for each error structure model specified. Models can be given labels up to 32 characters in length. Model labels are used in the printed output to identify the results for different models. If no label is specified, the response variable name is used as the label for the model. The model label is specified as follows: label: MODEL response = regressors / options ; The following options can be specified on the MODEL statement after a slash (/). CORRB CORR
prints the matrix of estimated correlations between the parameter estimates. COVB VAR
prints the matrix of estimated covariances between the parameter estimates. FIXONE
specifies that a one-way fixed-effects model be estimated with the one-way model that corresponds to group effects only.
TEST Statement F 1929
FIXTWO
specifies that a two-way fixed-effects model be estimated. RANONE
specifies that a one-way random-effects model be estimated. RANTWO
specifies that a two-way random-effects model be estimated. FULLER
specifies that the model be estimated by using the Fuller-Battese method, which assumes a variance components model for the error structure. PARKS
specifies that the model be estimated by using the Parks method, which assumes a first-order autoregressive model for the error structure. DASILVA
specifies that the model be estimated by using the Da Silva method, which assumes a mixed variance-component moving-average model for the error structure. M=number
specifies the order of the moving-average process in the Da Silva method. The M= value must be less than T 1. The default is M=1. PHI
prints the ˆ matrix of estimated covariances of the observations for the Parks method. The PHI option is relevant only when the PARKS option is used. RHO
prints the estimated autocorrelation coefficients for the Parks method. NOINT NOMEAN
suppresses the intercept parameter from the model. NOPRINT
suppresses the normal printed output. SINGULAR=number
specifies a singularity criterion for the inversion of the matrix. The default depends on the precision of the computer system.
TEST Statement TEST equation < , equation . . . > < / options > ;
1930 F Chapter 30: The TSCSREG Procedure
The TEST statement performs F tests of linear hypotheses about the regression parameters in the preceding MODEL statement. Each equation specifies a linear hypothesis to be tested. All hypotheses in one TEST statement are tested jointly. Variable names in the equations must correspond to regressors in the preceding MODEL statement, and each name represents the coefficient of the corresponding regressor. The keyword INTERCEPT refers to the coefficient of the intercept. The following statements illustrate the use of the TEST statement: proc tscsreg; model y = x1 x2 x3; test x1 = 0, x2 * .5 + 2 * x3= 0; test_int: test intercept=0, x3 = 0;
Note that a test of the following form is not permitted: test_bad: test x2 / 2 + 2 * x3= 0;
Do not use the division sign in test/restrict statements.
Details: The TSCSREG Procedure Models, estimators, and methods are covered in detail in Chapter 19, “The PANEL Procedure.”
ODS Table Names PROC TSCSREG assigns a name to each table it creates. You can use these names to reference the table when you use the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. Table 30.2
ODS Tables Produced in PROC TSCSREG
ODS Table Name
Description
ODS Tables Created by the MODEL Statement ModelDescription Model description FitStatistics Fit statistics FixedEffectsTest F test for no fixed effects
ParameterEstimates CovB CorrB
Parameter estimates Covariance of parameter estimates Correlations of parameter estimates
Option default default FIXONE, FIXTWO, RANONE, RANTWO default COVB CORRB
Examples: The TSCSREG Procedure F 1931
Table 30.2
continued
ODS Table Name
Description
Option
VarianceComponents
Variance component estimates
RandomEffectsTest
Hausman test for random effects
AR1Estimates
First order autoregressive parameter estimates Estimated phi matrix Estimates of autocovariances
FULLER, DASILVA, M=, RANONE, RANTWO FULLER, DASILVA, M=, RANONE, RANTWO PARKS, RHO
EstimatedPhiMatrix EstimatedAutocovariances
PARKS DASILVA, M=
ODS Tables Created by the TEST Statement TestResults Test results
Examples: The TSCSREG Procedure For examples of analysis of panel data, see Chapter 19, “The PANEL Procedure.”
Acknowledgments: TSCSREG Procedure The original TSCSREG procedure was developed by Douglas J. Drummond and A. Ronald Gallant, and contributed to the Version 5 SUGI Supplemental Library in 1979. The original code was changed substantially over the years. Additional new methods as well as other new features are currently included in the PANEL PROCEDURE. SAS Institute would like to thank Dr. Drummond and Dr. Gallant for their contribution of the original version of the TSCSREG procedure.
References: TSCSREG Procedure Greene, W. H. (1990), Econometric Analysis, First Edition, New York: Macmillan Publishing Company.
1932
Chapter 31
The UCM Procedure Contents Overview: UCM Procedure . . . . . . . . . . . . . . . . . Getting Started: UCM Procedure . . . . . . . . . . . . . . A Seasonal Series with Linear Trend . . . . . . . . Syntax: UCM Procedure . . . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . PROC UCM Statement . . . . . . . . . . . . . . . AUTOREG Statement . . . . . . . . . . . . . . . . BLOCKSEASON Statement . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . CYCLE Statement . . . . . . . . . . . . . . . . . . DEPLAG Statement . . . . . . . . . . . . . . . . . ESTIMATE Statement . . . . . . . . . . . . . . . . FORECAST Statement . . . . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . . . . . . . IRREGULAR Statement . . . . . . . . . . . . . . . LEVEL Statement . . . . . . . . . . . . . . . . . . MODEL Statement . . . . . . . . . . . . . . . . . . NLOPTIONS Statement . . . . . . . . . . . . . . . OUTLIER Statement . . . . . . . . . . . . . . . . . RANDOMREG Statement . . . . . . . . . . . . . . SEASON Statement . . . . . . . . . . . . . . . . . SLOPE Statement . . . . . . . . . . . . . . . . . . SPLINEREG Statement . . . . . . . . . . . . . . . SPLINESEASON Statement . . . . . . . . . . . . . Details: UCM Procedure . . . . . . . . . . . . . . . . . . An Introduction to Unobserved Component Models The UCMs as State Space Models . . . . . . . . . . Outlier Detection . . . . . . . . . . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . . . . Parameter Estimation . . . . . . . . . . . . . . . . Computational Issues . . . . . . . . . . . . . . . . Displayed Output . . . . . . . . . . . . . . . . . . . Statistical Graphics . . . . . . . . . . . . . . . . . . ODS Table Names . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 1934 . 1935 . 1935 . 1943 . 1943 . 1946 . 1949 . 1950 . 1952 . 1952 . 1954 . 1955 . . 1957 . 1959 . 1960 . 1963 . 1964 . 1964 . 1965 . 1966 . 1966 . 1969 . 1970 . . 1971 . 1973 . 1973 . 1979 . 1988 . 1989 . 1989 . . 1991 . 1992 . 1992 . 2003
1934 F Chapter 31: The UCM Procedure
ODS Graph Names . . . . . . . . . . . . . . . . . . . . . . . . . OUTFOR= Data Set . . . . . . . . . . . . . . . . . . . . . . . . OUTEST= Data Set . . . . . . . . . . . . . . . . . . . . . . . . Statistics of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples: UCM Procedure . . . . . . . . . . . . . . . . . . . . . . . Example 31.1: The Airline Series Revisited . . . . . . . . . . . Example 31.2: Variable Star Data . . . . . . . . . . . . . . . . . Example 31.3: Modeling Long Seasonal Patterns . . . . . . . . Example 31.4: Modeling Time-Varying Regression Effects . . . Example 31.5: Trend Removal Using the Hodrick-Prescott Filter Example 31.6: Using Splines to Incorporate Nonlinear Effects . Example 31.7: Detection of Level Shift . . . . . . . . . . . . . . Example 31.8: ARIMA Modeling . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. .
. .
.
2006 2009 2011 2011 2013 2013 2018 2021 2025 2031 2033 2038 2041 2045
Overview: UCM Procedure The UCM procedure analyzes and forecasts equally spaced univariate time series data by using an unobserved components model (UCM). The UCMs are also called structural models in the time series literature. A UCM decomposes the response series into components such as trend, seasonals, cycles, and the regression effects due to predictor series. The components in the model are supposed to capture the salient features of the series that are useful in explaining and predicting its behavior. Harvey (1989) is a good reference for time series modeling that uses the UCMs. Harvey calls the components in a UCM the “stylized facts” about the series under consideration. Traditionally, the ARIMA models and, to some limited extent, the exponential smoothing models have been the main tools in the analysis of this type of time series data. It is fair to say that the UCMs capture the versatility of the ARIMA models while possessing the interpretability of the smoothing models. A thorough discussion of the correspondence between the ARIMA models and the UCMs, and the relative merits of UCM and ARIMA modeling, is given in Harvey (1989). The UCMs are also very similar to another set of models, called the dynamic models, that are popular in the Bayesian time series literature (West and Harrison 1999). In SAS/ETS you can use PROC ARIMA for ARIMA modeling (see Chapter 7, “The ARIMA Procedure”), PROC ESM for exponential smoothing modeling (see Chapter 13, “The ESM Procedure”), and use the Time Series Forecasting System for a point-and-click interface to ARIMA and exponential smoothing modeling. You can use the UCM procedure to fit a wide range of UCMs that can incorporate complex trend, seasonal, and cyclical patterns and can include multiple predictors. It provides a variety of diagnostic tools to assess the fitted model and to suggest the possible extensions or modifications. The components in the UCM provide a succinct description of the underlying mechanism governing the series. You can print, save, or plot the estimates of these component series. Along with the standard forecast and residual plots, the study of these component plots is an essential part of time series analysis using the UCMs. Once a suitable UCM is found for the series under consideration, it can be used for a variety of purposes. For example, it can be used for the following:
Getting Started: UCM Procedure F 1935
forecasting the values of the response series and the component series in the model obtaining a model-based seasonal decomposition of the series obtaining a “denoised” version and interpolating the missing values of the response series in the historical period obtaining the full sample or “smoothed” estimates of the component series in the model
Getting Started: UCM Procedure The analysis of time series using the UCMs involves recognizing the salient features present in the series and modeling them suitably. The UCM procedure provides a variety of models for estimating and forecasting the commonly observed features in time series. These models are discussed in detail later in the section “An Introduction to Unobserved Component Models” on page 1973. First the procedure is illustrated using an example.
A Seasonal Series with Linear Trend The airline passenger series, given as Series G in Box and Jenkins (1976), is often used in time series literature as an example of a nonstationary seasonal time series. This series is a monthly series consisting of the number of airline passengers who traveled during the years 1949 to 1960. Its main features are a steady rise in the number of passengers from year to year and the seasonal variation in the numbers during any given year. It also exhibits an increase in variability around the trend. A log transformation is used to stabilize this variability. The following DATA step prepares the log-transformed passenger series analyzed in this example: data seriesG; set sashelp.air; logair = log( air ); run;
The following statements produce a time series plot of the series by using the TIMESERIES procedure (see Chapter 29, “The TIMESERIES Procedure”). The trend and seasonal features of the series are apparent in the plot in Figure 31.1. ods graphics on; proc timeseries data=seriesG plot=series; id date interval=month; var logair; run;
1936 F Chapter 31: The UCM Procedure
Figure 31.1 Series Plot of Log-Transformed Airline Passenger Series
In this example this series is modeled using an unobserved component model called the basic structural model (BSM). The BSM models a time series as a sum of three stochastic components: a trend component t , a seasonal component t , and random error t . Formally, a BSM for a response series yt can be described as yt D t C t C t Each of the stochastic components in the model is modeled separately. The random error t , also called the irregular component, is modeled simply as a sequence of independent, identically distributed (i.i.d.) zero-mean Gaussian random variables. The trend and the seasonal components can be modeled in a few different ways. The model for trend used here is called a locally linear time trend. This trend model can be written as follows: t
D t
1
C ˇt
ˇt
D ˇt
1
C t ;
1
C t ;
t i:i:d: N.0; 2 / t i:i:d: N.0; 2 /
These equations specify a trend where the level t as well as the slope ˇt is allowed to vary over time. This variation in slope and level is governed by the variances of the disturbance terms t and t in their respective equations. Some interesting special cases of this model arise when you manipulate
A Seasonal Series with Linear Trend F 1937
these disturbance variances. For example, if the variance of t is zero, the slope will be constant (equal to ˇ0 ); if the variance of t is also zero, t will be a deterministic trend given by the line 0 C ˇ0 t . The seasonal model used in this example is called a trigonometric seasonal. The stochastic equations governing a trigonometric seasonal are explained later (see the section “Modeling Seasons” on page 1975). However, it is interesting to note here that this seasonal model reduces to the familiar regression with deterministic seasonal dummies if the variance of the disturbance terms in its equations is equal to zero. The following statements specify a BSM with these three components: proc ucm data=seriesG; id date interval=month; model logair; irregular; level; slope; season length=12 type=trig print=smooth; estimate; forecast lead=24 print=decomp; run;
The PROC UCM statement signifies the start of the UCM procedure, and the input data set, seriesG, containing the dependent series is specified there. The optional ID statement is used to specify a date, datetime, or time identification variable, date in this example, to label the observations. The INTERVAL=MONTH option in the ID statement indicates that the measurements were collected on a monthly basis. The model specification begins with the MODEL statement, where the response series is specified (logair in this case). After this the components in the model are specified using separate statements that enable you to control their individual properties. The irregular component t is specified using the IRREGULAR statement and the trend component t is specified using the LEVEL and SLOPE statements. The seasonal component t is specified using the SEASON statement. The specifics of the seasonal characteristics such as the season length, its stochastic evolution properties, etc., are specified using the options in the SEASON statement. The seasonal component used in this example has a season length of 12, corresponding to the monthly seasonality, and is of the trigonometric type. Different types of seasonals are explained later (see the section “Modeling Seasons” on page 1975). The parameters of this model are the variances of the disturbance terms in the evolution equations of t , ˇt , and t and the variance of the irregular component t . These parameters are estimated by maximizing the likelihood of the data. The ESTIMATE statement options can be used to specify the span of data used in parameter estimation and to display and save the results of the estimation step and the model diagnostics. You can use the estimated model to obtain the forecasts of the series as well as the components. The options in the individual component statements can be used to display the component forecasts—for example, PRINT=SMOOTH option in the SEASON statement requests the displaying of smoothed forecasts of the seasonal component t . The series forecasts and forecasts of the sum of components can be requested using the FORECAST statement. The option PRINT=DECOMP in the FORECAST statement requests the printing of the smoothed trend t and the trend plus seasonal component (t C t ). The parameter estimates for this model are displayed in Figure 31.2.
1938 F Chapter 31: The UCM Procedure
Figure 31.2 BSM for the Logair Series The UCM Procedure Final Estimates of the Free Parameters
Component
Parameter
Irregular Level Slope Season
Error Error Error Error
Variance Variance Variance Variance
Estimate
Approx Std Error
t Value
Approx Pr > |t|
0.00023436 0.00029828 8.47911E-13 0.00000356
0.0001079 0.0001057 6.2271E-10 1.32347E-6
2.17 2.82 0.00 2.69
0.0298 0.0048 0.9989 0.0072
The estimates suggest that except for the slope component, the disturbance variances of all the components are significant—that is, all these components are stochastic. The slope component, however, appears to be deterministic because its error variance is quite insignificant. It might then be useful to check if the slope component can be dropped from the model—that is, if ˇ0 D 0. This can be checked by examining the significance analysis table of the components given in Figure 31.3. Figure 31.3 Component Significance Analysis for the Logair Series Significance Analysis of Components (Based on the Final State) Component
DF
Chi-Square
Pr > ChiSq
Irregular Level Slope Season
1 1 1 11
0.08 117867 43.78 507.75
0.7747 |t|
0.09257 1.00000 29.00036 0.00000882 1.00000 24.00011 0.00000535
0.0053845 1.81175E-7 0.0022709 5.27213E-6 2.11939E-7 0.0019128 3.56374E-6
17.19 5519514 12770.4 1.67 4718334 12547.2 1.50
|t|
480.92258 6.23279 84.22334
109.21980 0.67533 79.88166
4.40 9.23 1.05
|t| Variable 0.04786 0.09359 0.04526 0.04769 0.05146 0.10064 0.04867 0.05128
-16.42 -15.65
0.0001 0.0001
8.32 -11.15
0.0001 0.0001
y1(t-1) y2(t-1) D_y1(t-1) D_y2(t-1) y1(t-1) y2(t-1) D_y1(t-1) D_y2(t-1)
The fitted model is given as 1 0:467 0:913 B .0:048/ .0:094/ C Cy yt D B @ 0:107 0:209 A t .0:051/ .0:100/ 0
1 0:743 0:746 B .0:045/ .0:048/ C C y B 1C@ 0:405 0:572 A t .0:049/ .0:051/ 0
1
C t
Figure 32.16 Change the VECM(2) Form to the VAR(2) Model Infinite Order AR Representation Lag 1 2 3
Variable y1 y2 y1 y2 y1 y2
y1
y2
-0.21013 0.51160 0.74332 -0.40493 0.00000 0.00000
0.16674 0.21980 0.74621 0.57157 0.00000 0.00000
Bayesian Vector Error Correction Model F 2065
The PRINT=(IARR) option in the previous SAS statements prints the reparameterized coefficient estimates. For the LAGMAX=3 in the SAS statements, the coefficient matrix of lag 3 is zero. The VECM(2) form in Figure 32.16 can be rewritten as the following second-order vector autoregressive model: yt D
0:210 0:167 0:512 0:220
yt
1
C
0:743 0:746 0:405 0:572
yt
2
C t
Bayesian Vector Error Correction Model Bayesian inference on a cointegrated system begins by using the priors of ˇ obtained from the VECM(p) form. Bayesian vector error correction models can improve forecast accuracy for cointegrated processes. The following statements fit a BVECM(2) form to the simulated data. You specify both the PRIOR= and ECM= options for the Bayesian vector error correction model. The VARMAX procedure output is shown in Figure 32.17. /*--- Bayesian Vector Error-Correction Model ---*/ proc varmax data=simul2; model y1 y2 / p=2 noint prior=( lambda=0.5 theta=0.2 ) ecm=( rank=1 normalize=y1 ) print=(estimates); run;
Figure 32.17 shows the model type fitted to the data, the estimates of the adjustment coefficient (˛), the parameter estimates in terms of lag one coefficients (yt 1 ), and lag one first differenced coefficients (yt 1 ).
2066 F Chapter 32: The VARMAX Procedure
Figure 32.17 Parameter Estimates for the BVECM(2) Form The VARMAX Procedure Type of Model Estimation Method Cointegrated Rank Prior Lambda Prior Theta
BVECM(2) Maximum Likelihood Estimation 1 0.5 0.2
Alpha Variable
1
y1 y2
-0.34392 0.16659
Parameter Alpha * Beta' Estimates Variable y1 y2
y1
y2
-0.34392 0.16659
0.67262 -0.32581
AR Coefficients of Differenced Lag DIF Lag 1
Variable y1 y2
y1
y2
-0.80070 0.33417
-0.59320 -0.53480
Vector Autoregressive Process with Exogenous Variables A VAR process can be affected by other observable variables that are determined outside the system of interest. Such variables are called exogenous (independent) variables. Exogenous variables can be stochastic or nonstochastic. The process can also be affected by the lags of exogenous variables. A model used to describe this process is called a VARX(p,s) model. The VARX(p,s) model is written as
yt D ı C
p X i D1
ˆi yt
i
C
s X
‚i xt
i
C t
i D0
where xt D .x1t ; : : : ; xrt /0 is an r-dimensional time series vector and ‚i is a k r matrix.
Vector Autoregressive Process with Exogenous Variables F 2067
For example, a VARX(1,0) model is yt D ı C ˆ1 yt
1
C ‚0 xt C t
where yt D .y1t ; y2t ; y3t /0 and xt D .x1t ; x2t /0 . The following statements fit the VARX(1,0) model to the given data: data grunfeld; input year y1 y2 y3 x1 x2 x3; label y1='Gross Investment GE' y2='Capital Stock Lagged GE' y3='Value of Outstanding Shares GE Lagged' x1='Gross Investment W' x2='Capital Stock Lagged W' x3='Value of Outstanding Shares Lagged W'; datalines; 1935 33.1 1170.6 97.8 12.93 191.5 1.8 1936 45.0 2015.8 104.4 25.90 516.0 .8 1937 77.2 2803.3 118.0 35.05 729.0 7.4 ... more lines ...
/*--- Vector Autoregressive Process with Exogenous Variables ---*/ proc varmax data=grunfeld; model y1-y3 = x1 x2 / p=1 lagmax=5 printform=univariate print=(impulsx=(all) estimates); run;
The VARMAX procedure output is shown in Figure 32.18 through Figure 32.20.
2068 F Chapter 32: The VARMAX Procedure
Figure 32.18 shows the descriptive statistics for the dependent (endogenous) and independent (exogenous) variables with labels. Figure 32.18 Descriptive Statistics for the VARX(1, 0) Model The VARMAX Procedure Number of Observations Number of Pairwise Missing
20 0
Simple Summary Statistics
Variable Type y1 y2 y3 x1 x2
Dependent Dependent Dependent Independent Independent
N
Mean
Standard Deviation
Min
Max
20 20 20 20 20
102.29000 1941.32500 400.16000 42.89150 670.91000
48.58450 413.84329 250.61885 19.11019 222.39193
33.10000 1170.60000 97.80000 12.93000 191.50000
189.60000 2803.30000 888.90000 90.08000 1193.50000
Simple Summary Statistics Variable Label y1 y2 y3 x1 x2
Gross Investment GE Capital Stock Lagged GE Value of Outstanding Shares GE Lagged Gross Investment W Capital Stock Lagged W
Vector Autoregressive Process with Exogenous Variables F 2069
Figure 32.19 shows the parameter estimates for the constant, the lag zero coefficients of exogenous variables, and the lag one AR coefficients. From the schematic representation of parameter estimates, the significance of the parameter estimates can be easily verified. The symbol “C” means the constant and “XL0” means the lag zero coefficients of exogenous variables. Figure 32.19 Parameter Estimates for the VARX(1, 0) Model The VARMAX Procedure Type of Model Estimation Method
VARX(1,0) Least Squares Estimation
Constant Variable
Constant
y1 y2 y3
-12.01279 702.08673 -22.42110
XLag Lag 0
Variable y1 y2 y3
x1
x2
1.69281 -6.09850 -0.02317
-0.00859 2.57980 -0.01274
AR Lag 1
Variable y1 y2 y3
y1
y2
y3
0.23699 -2.46656 0.95116
0.00763 0.16379 0.00224
0.02941 -0.84090 0.93801
Schematic Representation Variable/ Lag
C
XL0
AR1
y1 y2 y3
. + -
+. .+ ..
... ... +.+
+ is > 2*std error, is < -2*std error, . is between, * is N/A
2070 F Chapter 32: The VARMAX Procedure
Figure 32.20 shows the parameter estimates and their significance. Figure 32.20 Parameter Estimates for the VARX(1, 0) Model Continued Model Parameter Estimates
Equation Parameter y1
CONST1 XL0_1_1 XL0_1_2 AR1_1_1 AR1_1_2 AR1_1_3 CONST2 XL0_2_1 XL0_2_2 AR1_2_1 AR1_2_2 AR1_2_3 CONST3 XL0_3_1 XL0_3_2 AR1_3_1 AR1_3_2 AR1_3_3
y2
y3
Estimate -12.01279 1.69281 -0.00859 0.23699 0.00763 0.02941 702.08673 -6.09850 2.57980 -2.46656 0.16379 -0.84090 -22.42110 -0.02317 -0.01274 0.95116 0.00224 0.93801
Standard Error t Value Pr > |t| Variable 27.47108 0.54395 0.05361 0.20668 0.01627 0.04852 256.48046 5.07849 0.50056 1.92967 0.15193 0.45304 10.31166 0.20418 0.02012 0.07758 0.00611 0.01821
-0.44 3.11 -0.16 1.15 0.47 0.61 2.74 -1.20 5.15 -1.28 1.08 -1.86 -2.17 -0.11 -0.63 12.26 0.37 51.50
0.6691 0.0083 0.8752 0.2722 0.6470 0.5548 0.0169 0.2512 0.0002 0.2235 0.3006 0.0862 0.0487 0.9114 0.5377 0.0001 0.7201 0.0001
1 x1(t) x2(t) y1(t-1) y2(t-1) y3(t-1) 1 x1(t) x2(t) y1(t-1) y2(t-1) y3(t-1) 1 x1(t) x2(t) y1(t-1) y2(t-1) y3(t-1)
The fitted model is given as 0
y1t
B B B y2t B @ y3t
1
0
B C B C B C D B C B B A @ 0
12:013 .27:471/ 702:086 .256:480/ 22:421 .10:312/
1
0
C B C B C B CCB C B C B A @
0:237 0:008 B .0:207/ .0:016/ B B 2:467 0:164 C B B .1:930/ .0:152/ B @ 0:951 0:002 .0:078/ .0:006/
1 1:693 0:009 0 1 .0:544/ .0:054/ C C x1t 6:099 2:580 C C@ A .5:078/ .0:501/ C C x2t 0:023 0:013 A .0:204/ .0:020/ 1 1 0 0:029 0 1t y1;t 1 .0:049/ C C B CB C B B 0:841 C C B y2;t 1 C C B 2t C B C B .0:453/ C @ A @ 0:938 A y3;t 1 3t .0:018/
1 C C C C A
Parameter Estimation and Testing on Restrictions F 2071
Parameter Estimation and Testing on Restrictions In the previous example, the VARX(1,0) model is written as yt D ı C ‚0 xt C ˆ1 yt
1
C t
with 1 0 1 11 12 13 12 11 A 22 ˆ1 D @ 21 22 23 A ‚0 D @ 21 31 32 33 31 32 0
In Figure 32.20 of the preceding section, you can see several insignificant parameters. For example, the coefficients XL0_1_2, AR1_1_2, and AR1_3_2 are insignificant. The following statements restrict the coefficients of 12 D 12 D 32 D 0 for the VARX(1,0) model.
/*--- Models with Restrictions and Tests ---*/ proc varmax data=grunfeld; model y1-y3 = x1 x2 / p=1 print=(estimates); restrict XL(0,1,2)=0, AR(1,1,2)=0, AR(1,3,2)=0; run; The output in Figure 32.21 shows that three parameters 12 , 12 , and 32 are replaced by the restricted values, zeros. In the schematic representation of parameter estimates, the three restricted parameters 12 , 12 , and 32 are replaced by .
2072 F Chapter 32: The VARMAX Procedure
Figure 32.21 Parameter Estimation with Restrictions The VARMAX Procedure XLag Lag
Variable
0
y1 y2 y3
x1
x2
1.67592 -6.30880 -0.03576
0.00000 2.65308 -0.00919
AR Lag 1
Variable y1 y2 y3
y1
y2
y3
0.27671 -2.16968 0.96398
0.00000 0.10945 0.00000
0.01747 -0.93053 0.93412
Schematic Representation Variable/ Lag
C
XL0
AR1
y1 y2 y3
. + -
+* .+ ..
.*. ..+ *+
+ is > 2*std error, is < -2*std error, . is between, * is N/A
The output in Figure 32.22 shows the estimates of the Lagrangian parameters and their significance. Based on the p-values associated with the Lagrangian parameters, you cannot reject the null hypotheses 12 D 0, 12 D 0, and 32 D 0 with the 0.05 significance level. Figure 32.22 RESTRICT Statement Results Testing of the Restricted Parameters
Parameter
Estimate
Standard Error
t Value
Pr > |t|
XL0_1_2 AR1_1_2 AR1_3_2
1.74969 30.36254 55.42191
21.44026 70.74347 164.03075
0.08 0.43 0.34
0.9389 0.6899 0.7524
The TEST statement in the following example tests 31 D 0 and 12 D 12 D 32 D 0 for the VARX(1,0) model:
Causality Testing F 2073
proc varmax data=grunfeld; model y1-y3 = x1 x2 / p=1; test AR(1,3,1)=0; test XL(0,1,2)=0, AR(1,1,2)=0, AR(1,3,2)=0; run;
The output in Figure 32.23 shows that the first column in the output is the index corresponding to each TEST statement. You can reject the hypothesis test 31 D 0 at the 0.05 significance level, but you cannot reject the joint hypothesis test 12 D 12 D 32 D 0 at the 0.05 significance level. Figure 32.23 TEST Statement Results The VARMAX Procedure Testing of the Parameters Test
DF
Chi-Square
Pr > ChiSq
1 2
1 3
150.31 0.34
ChiSq
1 2
3 2
2.40 262.88
0.4946 ; ID variable interval=value < option > ; MODEL dependent variables < =regressors > < , dependent variables < =regressors > . . . > < / options > ; GARCH options ; NLOPTIONS options ; OUTPUT < options > ; RESTRICT restrictions ; TEST restrictions ;
Functional Summary F 2075
Functional Summary The statements and options used with the VARMAX procedure are summarized in the following table: Table 32.1
VARMAX Functional Summary
Description Data Set Options specify the input data set write parameter estimates to an output data set include covariances in the OUTEST= data set write the diagnostic checking tests for a model and the cointegration test results to an output data set write actuals, predictions, residuals, and confidence limits to an output data set write the conditional covariance matrix to an output data set
Statement
Option
VARMAX VARMAX VARMAX VARMAX
DATA= OUTEST= OUTCOV OUTSTAT=
OUTPUT
OUT=
GARCH
OUTHT=
BY Groups specify BY-group processing
BY
ID Variable specify identifying variable specify the time interval between observations control the alignment of SAS Date values
ID ID ID
Options to Control the Optimization Process specify the optimization options
NLOPTIONS
Printing Control Options specify how many lags to print results suppress the printed output request all printing options request the printing format controls plots produced through ODS GRAPHICS
MODEL MODEL MODEL MODEL VARMAX
LAGMAX= NOPRINT PRINTALL PRINTFORM= PLOTS=
MODEL MODEL
CORRB CORRX
MODEL
CORRY
MODEL MODEL
COVPE COVX
PRINT= Option print the correlation matrix of parameter estimates print the cross-correlation matrices of independent variables print the cross-correlation matrices of dependent variables print the covariance matrices of prediction errors print the cross-covariance matrices of the independent variables
INTERVAL= ALIGN=
2076 F Chapter 32: The VARMAX Procedure
Table 32.1
continued
Description
Statement
Option
print the cross-covariance matrices of the dependent variables print the covariance matrix of parameter estimates print the decomposition of the prediction error covariance matrix print the residual diagnostics print the contemporaneous relationships among the components of the vector time series print the parameter estimates print the infinite order AR representation print the impulse response function print the impulse response function in the transfer function print the partial autoregressive coefficient matrices print the partial canonical correlation matrices print the partial correlation matrices print the eigenvalues of the companion matrix print the Yule-Walker estimates
MODEL
COVY
MODEL MODEL
COVB DECOMPOSE
MODEL MODEL
DIAGNOSE DYNAMIC
MODEL MODEL MODEL MODEL
ESTIMATES IARR IMPULSE= IMPULSX=
MODEL MODEL MODEL MODEL MODEL
PARCOEF PCANCORR PCORR ROOTS YW
MODEL MODEL
CENTER DIF=
MODEL
DIFX=
MODEL
DIFY=
MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL MODEL
ECM= METHOD= MINIC= NOCURRENTX NOINT NSEASON= P= PRIOR= Q= SCENTER TREND= VARDEF=
MODEL
XLAG=
GARCH
FORM=
Model Estimation and Order Selection Options center the dependent variables specify the degrees of differencing for the specified model variables specify the degrees of differencing for all independent variables specify the degrees of differencing for all dependent variables specify the vector error correction model specify the estimation method select the tentative order suppress the current values of independent variables suppress the intercept parameters specify the number of seasonal periods specify the order of autoregressive polynomial specify the Bayesian prior model specify the order of moving-average polynomial center the seasonal dummies specify the degree of time trend polynomial specify the denominator for error covariance matrix estimates specify the lag order of independent variables
GARCH Related Options specify the GARCH-type model
PROC VARMAX Statement F 2077
Table 32.1
continued
Description
Statement
Option
specify the order of the GARCH polynomial specify the order of the ARCH polynomial
GARCH GARCH
P= Q=
COINTEG
EXOGENEITY
COINTEG
H=
COINTEG
J=
COINTEG
NORMALIZE=
COINTEG MODEL
RANK= COINTTEST= (JOHANSEN= ) COINTTEST=(SW= ) DFTEST=
Cointegration Related Options print the results from the weak exogeneity test of the long-run parameters specify the restriction on the cointegrated coefficient matrix specify the restriction on the adjustment coefficient matrix specify the variable name whose cointegrating vectors are normalized specify a cointegration rank print the Johansen cointegration rank test print the Stock-Watson common trends test print the Dickey-Fuller unit root test
MODEL MODEL
Tests and Restrictions on Parameters test the Granger causality
CAUSAL
place and test restrictions on parameter estimates test hypotheses on parameter estimates
RESTRICT TEST
Forecasting Control Options specify the size of confidence limits for forecasting start forecasting before end of the input data specify how many periods to forecast suppress the printed forecasts
OUTPUT OUTPUT OUTPUT OUTPUT
GROUP1= GROUP2=
ALPHA= BACK= LEAD= NOPRINT
PROC VARMAX Statement PROC VARMAX options ;
The following options can be used in the PROC VARMAX statement: DATA=SAS-data-set
specifies the input SAS data set. If the DATA= option is not specified, the PROC VARMAX statement uses the most recently created SAS data set. OUTEST=SAS-data-set
writes the parameter estimates to the output data set.
2078 F Chapter 32: The VARMAX Procedure
COVOUT OUTCOV
writes the covariance matrix for the parameter estimates to the OUTEST= data set. This option is valid only if the OUTEST= option is specified. OUTSTAT=SAS-data-set
writes residual diagnostic results to an output data set. If the COINTTEST=(JOHANSEN) option is specified, the results of this option are also written to the output data set. The following statements are the examples of these options in the PROC VARMAX statement: proc varmax data=one outest=est outcov outstat=stat; model y1-y3 / p=1; run; proc varmax data=one outest=est outstat=stat; model y1-y3 / p=1 cointtest=(johansen); run;
PLOTS< (global-plot-option) > = plot-request-option < (options) > PLOTS< (global-plot-option) > = ( plot-request-option < (options) > ... plot-request-option < (options) > )
controls the plots produced through ODS Graphics. When you specify only one plot, you can omit the parentheses around the plot request. Some examples follow: plots=none plots=all plots(unpack)=residual(residual normal) plots=(forecasts model)
You must enable ODS Graphics before requesting plots as shown in the following example. For general information about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide). ods graphics on; proc varmax data=one plots=impulse(simple); model y1-y3 / p=1; run; proc varmax data=one plots=(model residual); model y1-y3 / p=1; run; proc varmax data=one plots=forecasts; model y1-y3 / p=1; output lead=12; run;
PROC VARMAX Statement F 2079
The first VARMAX program produces the simple response impulse plots. The second VARMAX program produces the plots associated with the model and prediction errors. The plots associated with prediction errors are the ACF, PACF, IACF, distribution, white-noise, and Normal quantile plots and the prediction error plot. The third VARMAX program produces the FORECASTS and FORECASTSONLY plots. The global-plot-option applies to the impulse and prediction error analysis plots generated by the VARMAX procedure. The following global-plot-option is available: UNPACK
breaks a graphic that is otherwise paneled into individual component plots.
The following plot-request-options are available: ALL
produces all plots appropriate for the particular analysis.
FORECASTS < (forecasts-plot-options ) > produces plots of the forecasts. The forecastsonly plot that shows the multistep forecasts in the forecast region is produced by default. The following forecasts-plot-options are available: ALL
produces the FORECASTSONLY and the FORECASTS plots. This is the default.
FORECASTS
produces a plot that shows the one-step-ahead as well as the multistep forecasts.
FORECASTSONLY produces a plot that shows only the multistep forecasts. IMPULSE < (impulse-plot-options ) > produces the plots of impulse response function and the impulse response of the transfer function. ALL
produces all impulse plots. This is the default.
ACCUM
produces the accumulated impulse plot.
ORTH
produces the orthogonalized impulse plot.
SIMPLE
produces the simple impulse plot.
MODEL
produces plots of dependent variables listed in the MODEL statement and plots of the one-step-ahead predicted values for each dependent variables.
NONE
suppresses all plots.
RESIDUAL < (residual-plot-options ) > produces plots associated with the prediction errors obtained after modeling the data. The following residual-plot-options are available: ALL
produces all plots associated with the analysis of the prediction errors. This is the default.
RESIDUAL
produces prediction error plot.
DIAGNOSTICS produces a panel of plots useful in assessing the autocorrelations and white-noise of the prediction errors. The panel consists of the following:
the autocorrelation plot of the prediction errors
2080 F Chapter 32: The VARMAX Procedure
NORMAL
the partial autocorrelation plot of the prediction errors
the inverse autocorrelation plot of the prediction errors
the log scaled white noise plot of the prediction errors
produces a panel of plots useful in assessing normality of the prediction errors. The panel consists of the following:
distribution of the prediction errors with overlaid the normal curve
normal quantile plot of the prediction errors
Other Options In addition, any of the following MODEL statement options can be specified in the PROC VARMAX statement, which is equivalent to specifying the option for every MODEL statement: CENTER, DFTEST=, DIF=, DIFX=, DIFY=, LAGMAX=, METHOD=, MINIC=, NOCURRENTX, NOINT, NOPRINT, NSEASON=, P=, PRINT=, PRINTALL, PRINTFORM=, Q=, SCENTER, TREND=, VARDEF=, and XLAG= options. The following is an example of the options in the PROC VARMAX statement: proc varmax data=one lagmax=3 method=ml; model y1-y3 / p=1; run;
BY Statement BY variables ;
A BY statement can be used with PROC VARMAX to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If your input data set is not sorted in ascending order, use one of the following alternatives: Sort the data using the SORT procedure with a similar BY statement. Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the VARMAX procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
CAUSAL Statement F 2081
Create an index on the BY variables using the DATASETS procedure. For more information about the BY statement, see in SAS Language Reference: Concepts. For more information about the DATASETS procedure, see the discussion in the Base SAS Procedures Guide. The following is an example of the BY statement: proc varmax data=one; by region; model y1-y3 / p=1; run;
CAUSAL Statement CAUSAL GROUP1=( variables) GROUP2=( variables) ;
A CAUSAL statement prints the Granger causality test by fitting the VAR(p) model by using all variables defined in GROUP1 and GROUP2. Any number of CAUSAL statements can be specified. The CAUSAL statement proceeds with the MODEL statement and uses the variables and the autoregressive order, p, specified in the MODEL statement. Variables in the GROUP1= and GROUP2= options should be defined in the MODEL statement. If the P=0 option is specified in the MODEL statement, the CAUSAL statement is not applicable. The null hypothesis of the Granger causality test is that GROUP1 is influenced only by itself, and not by GROUP2. If the hypothesis test fails to reject the null, then the variables listed in GROUP1 might be considered as independent variables. See the section “VAR and VARX Modeling” on page 2133 for details. The following is an example of the CAUSAL statement. You specify the CAUSAL statement with the GROUP1= and GROUP2= options. proc varmax data=one; model y1-y3 = x1 / p=1; causal group1=(x1) group2=(y1-y3); causal group1=(y2) group2=(y1 y3); run;
The first CAUSAL statement fits the VAR(1) model by using the variables y1, y2, y3, and x1 and tests the null hypothesis that x1 causes the other variables, y1, y2, and y3, but the other variables do not cause x1. The second CAUSAL statement fits the VAR(1) model by using the variables y1, y3, and y2 and tests the null hypothesis that y2 causes the other variables, y1 and y3, but the other variables do not cause y2.
2082 F Chapter 32: The VARMAX Procedure
COINTEG Statement COINTEG RANK=number < H=( matrix) > < J=( matrix) > < EXOGENEITY > < NORMALIZE=variable > ;
The COINTEG statement fits the vector error correction model to the data, tests the restrictions of the long-run parameters and the adjustment parameters, and tests for the weak exogeneity in the long-run parameters. The cointegrated system uses the maximum likelihood analysis proposed by Johansen and Juselius (1990) and Johansen (1995a, 1995b). Only one COINTEG statement is allowed. You specify the ECM= option in the MODEL statement or the COINTEG statement to fit the VECM(p). The P= option in the MODEL statement is used to specify the autoregressive order of the VECM. The following statements are equivalent for fitting a VECM(2). proc varmax data=one; model y1-y3 / p=2 ecm=(rank=1); run;
proc varmax data=one; model y1-y3 / p=2; cointeg rank=1; run;
To test restrictions of either ˛ or ˇ or both, you specify either J= or H= or both, respectively. You specify the EXOGENEITY option in the COINTEG statement for tests of the weak exogeneity in the long-run parameters. The following is an example of the COINTEG statement. proc varmax data=one; model y1-y3 / p=2; cointeg rank=1 h=(1 0, -1 0, 0 1) j=(1 0, 0 0, 0 1) exogeneity; run;
The following options can be used in the COINTEG statement: EXOGENEITY
formulates the likelihood ratio tests for testing weak exogeneity in the long-run parameters. The null hypothesis is that one variable is weakly exogenous for the others. H=(matrix)
specifies the restrictions H on the k r or .k C 1/ r cointegrated coefficient matrix ˇ such that ˇ D H, where H is known and is unknown. If the VECM(p) is specified with the COINTEG statement or with the ECM= option in the MODEL statement and the ECTREND option is not included with the ECM= specification, then the H matrix has dimension k m.
COINTEG Statement F 2083
If the VECM(p) is specified with the COINTEG statement or with the ECM= option in the MODEL statement and the ECTREND option is also used, then the H matrix has dimension .k C 1/ m. Here k is the number of dependent variables, and m is r m < k where r is defined with the RANK=r option. For example, consider a system that contains four variables and the RANK=1 option with ˇ D .ˇ1 ; ˇ2 ; ˇ3 ; ˇ4 /0 . The restriction matrix for the test of ˇ1 C ˇ2 D 0 can be specified as cointeg rank=1 h=(1 0 0, -1 0 0, 0 1 0, 0 0 1);
Here the matrix H is 4 3 where k D 4 and m D 3, and each row of the matrix H is separated by commas. When the series has no separate deterministic trend, the constant term should be restricted by ˛0? ı D 0. In the preceding example, the ˇ can be either ˇ D .ˇ1 ; ˇ2 ; ˇ3 ; ˇ4 ; 1/0 or ˇ D .ˇ1 ; ˇ2 ; ˇ3 ; ˇ4 ; t /0 . You can specify the restriction matrix for the previous test of ˇ1 C ˇ2 D 0 as follows: cointeg rank=1 h=(1 0 0 0, -1 0 0 0, 0 1 0 0, 0 0 1 0, 0 0 0 1);
When the cointegrated system contains three dependent variables and the RANK=2 option, you can specify the restriction matrix for the test of ˇ1j D ˇ2j for j D 1; 2 as follows: cointeg rank=2 h=(1 0, -1 0, 0 1);
J=(matrix)
specifies the restrictions J on the k r adjustment matrix ˛ such that ˛ D J , where J is known and is unknown. The k m matrix J is specified by using this option, where k is the number of dependent variables, m is r m < k, and r is defined with the RANK=r option. For example, when the system contains four variables and the RANK=1 option is used, you can specify the restriction matrix for the test of ˛j D 0 for j D 2; 3; 4 as follows: cointeg rank=1 j=(1, 0, 0, 0);
When the system contains three variables and the RANK=2 option, you can specify the restriction matrix for the test of ˛2j D 0 for j D 1; 2 as follows: cointeg rank=2 j=(1 0, 0 0, 0 1);
NORMALIZE=variable
specifies a single dependent (endogenous) variable name whose cointegrating vectors are normalized. If the variable name is different from that specified in the COINTTEST=(JOHANSEN= ) or ECM= option in the MODEL statement, the variable name specified in the COINTEG statement is used. If the normalized variable is not specified, cointegrating vectors are not normalized.
2084 F Chapter 32: The VARMAX Procedure
RANK=number
specifies the cointegration rank of the cointegrated system. This option is required in the COINTEG statement. The rank of cointegration should be greater than zero and less than the number of dependent (endogenous) variables. If the value of the RANK= option in the COINTEG statement is different from that specified in the ECM= option, the rank specified in the COINTEG statement is used.
ID Statement ID variable INTERVAL=value < ALIGN=value > ;
The ID statement specifies a variable that identifies observations in the input data set. The datetime variable specified in the ID statement is included in the OUT= data set if the OUTPUT statement is specified. Note that the ID variable is usually a SAS datetime variable. The values of the ID variable are extrapolated for the forecast observations based on the value of the INTERVAL= option. ALIGN=value
controls the alignment of SAS dates used to identify output observations. The ALIGN= option allows the following values: BEGINNING | BEG | B, MIDDLE | MID | M, and ENDING | END | E. The default is BEGINNING. The ALIGN= option is used to align the ID variable to the beginning, middle, or end of the time ID interval specified by the INTERVAL= option. INTERVAL=value
specifies the time interval between observations. This option is required in the ID statement. The INTERVAL= option is used in conjunction with the ID variable to check that the input data are in order and have no missing periods. The INTERVAL= option is also used to extrapolate the ID values past the end of the input data when the OUTPUT statement is specified. The following is an example of the ID statement: proc varmax data=one; id date interval=qtr align=mid; model y1-y3 / p=1; run;
MODEL Statement MODEL dependents < = regressors > < , dependents < = regressors > . . . > < / options > ;
The MODEL statement specifies dependent (endogenous) variables and independent (exogenous) variables for the VARMAX model. The multivariate model can have the same or different independent variables corresponding to the dependent variables. As a special case, the VARMAX procedure allows you to analyze one dependent variable. Only one MODEL statement is allowed.
MODEL Statement F 2085
For example, the following statements are equivalent ways of specifying the multivariate model for the vector .y1; y2; y3/: model y1 y2 y3 ; model y1-y3 ;
The following statements are equivalent ways of specifying the multivariate model with independent variables, where y1; y2; y3, and y4 are the dependent variables and x1; x2; x3; x4, and x5 are the independent variables: model model model model
y1 y2 y3 y4 = x1 x2 x3 x4 x5 ; y1 y2 y3 y4 = x1-x5 ; y1 = x1-x5, y2 = x1-x5, y3 y4 = x1-x5 ; y1-y4 = x1-x5 ;
When the multivariate model has different independent variables that correspond to each of the dependent variables, equations are separated by commas (,) and the model can be specified as illustrated by the following MODEL statement: model y1 = x1-x3, y2 = x3-x5, y3 y4 = x1-x5 ;
The following options can be used in the MODEL statement after a forward slash (/): CENTER
centers the dependent (endogenous) variables by subtracting their means. Note that centering is done after differencing when the DIF= or DIFY= option is specified. If there are exogenous (independent) variables, this option is not applicable. model y1 y2 / p=1 center;
DIF(variable (number-list) < ... variable (number-list) >) DIF=(variable (number-list) < ... variable (number-list) >)
specifies the degrees of differencing to be applied to the specified dependent or independent variables. The number-list must contain one or more numbers, each of which should be greater than zero. The differencing can be the same for all variables, or it can vary among variables. For example, the DIF=(y1 (1,4) y3 (1) x2 (2)) option specifies that the series y1 is differenced at lag 1 and at lag 4, which is .1
B 4 /.1
B/y1t D .y1t
y1;t
1/
.y1;t
the series y3 is differenced at lag 1, which is .y3t lag 2, which is .x2t x2;t 2 /.
4
y3;t
y1;t 1 /;
5/
and the series x2 is differenced at
The following uses the data dy1, y2, x1, and dx2, where dy1 D .1 .1 B/2 x2t . model y1 y2 = x1 x2 / p=1 dif=(y1(1) x2(2));
B/y1t and dx2 D
2086 F Chapter 32: The VARMAX Procedure
DIFX(number-list) DIFX=(number-list)
specifies the degrees of differencing to be applied to all independent variables. The number-list must contain one or more numbers, each of which should be greater than zero. For example, the DIFX=(1) option specifies that all of the independent series are differenced once at lag 1. The DIFX=(1,4) option specifies that all of the independent series are differenced at lag 1 and at lag 4. If independent variables are specified in the DIF= option, then the DIFX= option is ignored. The following statement uses the data y1, y2, dx1, and dx2, where dx1 D .1 dx2 D .1 B/x2t .
B/x1t and
model y1 y2 = x1 x2 / p=1 difx(1);
DIFY(number-list) DIFY=(number-list)
specifies the degrees of differencing to be applied to all dependent (endogenous) variables. The number-list must contain one or more numbers, each of which should be greater than zero. For details, see the DIFX= option. If dependent variables are specified in the DIF= option, then the DIFY= option is ignored. model y1 y2 / p=1 dify(1);
METHOD=value
requests the type of estimates to be computed. The possible values of the METHOD= option are as follows: LS
specifies least squares estimates.
ML
specifies maximum likelihood estimates.
When the ECM=, PRIOR=, and Q= options and the GARCH statement are specified, the default ML method is used regardless of the method given by the METHOD= option. model y1 y2 / p=1 method=ml;
NOCURRENTX
suppresses the current values xt of the independent variables. In general, the VARX(p; s) model is yt D ı C
p X i D1
ˆi yt
i
C
s X
‚i xt
i
C t
i D0
where p is the number of lags of the dependent variables included in the model, and s is the number of lags of the independent variables included in the model, including the contemporaneous values of xt . A VARX(1,2) model can be specified as:
MODEL Statement F 2087
model y1 y2 = x1 x2 / p=1 xlag=2;
If the NOCURRENTX option is specified, it suppresses the current values xt and starts with xt 1 . The VARX(p; s) model is redefined as: yt D ı C
p X i D1
ˆi yt
i
C
s X
‚i xt
i
C t
i D1
This model with p D 1 and s D 2 can be specified as: model y1 y2 = x1 x2 / p=1 xlag=2 nocurrentx;
NOINT
suppresses the intercept parameter ı. model y1 y2 / p=1 noint;
NSEASON=number
specifies the number of seasonal periods. When the NSEASON=number option is specified, (number –1) seasonal dummies are added to the regressors. If the NOINT option is specified, the NSEASON= option is not applicable. model y1 y2 / p=1 nseason=4;
SCENTER
centers seasonal dummies specified by the NSEASON= option. The centered seasonal dummies are generated by c .1=s/, where c is a seasonal dummy generated by the NSEASON=s option. model y1 y2 / p=1 nseason=4 scenter;
TREND=value
specifies the degree of deterministic time trend included in the model. Valid values are as follows: LINEAR
includes a linear time trend as a regressor.
QUAD
includes linear and quadratic time trends as regressors.
The TREND=QUAD option is not applicable for a cointegration analysis. model y1 y2 / p=1 trend=linear;
VARDEF=value
corrects for the degrees of freedom of the denominator for computing an error covariance matrix for the METHOD=LS option. If the METHOD=ML option is specified, the VARDEF=N option is always used. Valid values are as follows:
2088 F Chapter 32: The VARMAX Procedure
DF
specifies that the number of nonmissing observation minus the number of regressors be used.
N
specifies that the number of nonmissing observation be used. model y1 y2 / p=1 vardef=n;
Printing Control Options LAGMAX=number
specifies the maximum number of lags for which results are computed and displayed by the PRINT=(CORRX CORRY COVX COVY IARR IMPULSE= IMPULSX= PARCOEF PCANCORR PCORR) options. This option is also used to limit the printed results for the cross covariances and cross-correlations of residuals. The default is LAGMAX=min(12, T -2), where T is the number of nonmissing observations. model y1 y2 / p=1 lagmax=6;
NOPRINT
suppresses all printed output. model y1 y2 / p=1 noprint;
PRINTALL
requests all printing control options. The options set by the option PRINTALL are DFTEST=, MINIC=, PRINTFORM=BOTH, and PRINT=(CORRB CORRX CORRY COVB COVPE COVX COVY DECOMPOSE DYNAMIC IARR IMPULSE=(ALL) IMPULSX=(ALL) PARCOEF PCANCORR PCORR ROOTS YW). You can also specify this option as the option ALL. model y1 y2 / p=1 printall;
PRINTFORM=value
requests the printing format of the output generated by the PRINT= option and cross covariances and cross-correlations of residuals. Valid values are as follows: BOTH
prints output in both MATRIX and UNIVARIATE forms.
MATRIX
prints output in matrix form. This is the default.
UNIVARIATE
prints output by variables.
model y1 y2 / p=1 print=(impulse) printform=univariate;
MODEL Statement F 2089
Printing Options PRINT=(options)
The following options can be used in the PRINT=( ) option. The options are listed within parentheses. If a number in parentheses follows an option listed below, then the option prints the number of lags specified by number in parentheses. The default is the number of lags specified by the LAGMAX=number option. CORRB
prints the estimated correlations of the parameter estimates. CORRX CORRX(number )
prints the cross-correlation matrices of exogenous (independent) variables. The number should be greater than zero. CORRY CORRY(number )
prints the cross-correlation matrices of dependent (endogenous) variables. The number should be greater than zero. COVB
prints the estimated covariances of the parameter estimates. COVPE COVPE(number )
prints the covariance matrices of number-ahead prediction errors for the VARMAX(p,q,s) model. The number should be greater than zero. If the DIF= or DIFY= option is specified, the covariance matrices of multistep prediction errors are computed based on the differenced data. This option is not applicable when the PRIOR= option is specified. See the section “Forecasting” on page 2122 for details. COVX COVX(number )
prints the cross-covariance matrices of exogenous (independent) variables. The number should be greater than zero. COVY COVY(number )
prints the cross-covariance matrices of dependent (endogenous) variables. The number should be greater than zero. DECOMPOSE DECOMPOSE(number )
prints the decomposition of the prediction error covariances using up to the number of lags specified by number in parentheses for the VARMA(p,q) model. The number should be greater than zero. It can be interpreted as the contribution of innovations in one variable to the
2090 F Chapter 32: The VARMAX Procedure
mean squared error of the multistep forecast of another variable. The DECOMPOSE option also prints proportions of the forecast error variance. If the DIF= or DIFY= option is specified, the covariance matrices of multistep prediction errors are computed based on the differenced data. This option is not applicable when the PRIOR= option is specified. See the section “Forecasting” on page 2122 for details. DIAGNOSE
prints the residual diagnostics and model diagnostics. DYNAMIC
prints the contemporaneous relationships among the components of the vector time series. ESTIMATES
prints the coefficient estimates and a schematic representation of the significance and sign of the parameter estimates. IARR IARR(number )
prints the infinite order AR representation of a VARMA process. The number should be greater than zero. If the ECM= option and the COINTEG statement are specified, then the reparameterized AR coefficient matrices are printed. IMPULSE IMPULSE(number ) IMPULSE=(SIMPLE ACCUM ORTH STDERR ALL) IMPULSE(number )=(SIMPLE ACCUM ORTH STDERR ALL)
prints the impulse response function. The number should be greater than zero. It investigates the response of one variable to an impulse in another variable in a system that involves a number of other variables as well. It is an infinite order MA representation of a VARMA process. See the section “Impulse Response Function” on page 2111 for details. The following options can be used in the IMPULSE=( ) option. The options are specified within parentheses. ACCUM
prints the accumulated impulse response function.
ALL
is equivalent to specifying all of SIMPLE, ACCUM, ORTH, and STDERR.
ORTH
prints the orthogonalized impulse response function.
SIMPLE
prints the impulse response function. This is the default.
STDERR
prints the standard errors of the impulse response function, the accumulated impulse response function, or the orthogonalized impulse response function.
If the exogenous variables are used to fit the model, then the STDERR option is ignored.
MODEL Statement F 2091
IMPULSX IMPULSX(number ) IMPULSX=(SIMPLE ACCUM ALL) IMPULSX(number )=(SIMPLE ACCUM ALL)
prints the impulse response function related to exogenous (independent) variables. The number should be greater than zero. See the section “Impulse Response Function” on page 2111 for details. The following options can be used in the IMPULSX=( ) option. The options are specified within parentheses. ACCUM
prints the accumulated impulse response matrices for the transfer function.
ALL
is equivalent to specifying both SIMPLE and ACCUM.
SIMPLE
prints the impulse response matrices for the transfer function. This is the default.
PARCOEF PARCOEF(number )
prints the partial autoregression coefficient matrices, ˆmm up to the lag number. The number should be greater than zero. With a VAR process, this option is useful for the identification of the order since the ˆmm have the property that they equal zero for m > p under the hypothetical assumption of a VAR(p) model. See the section “Tentative Order Selection” on page 2127 for details. PCANCORR PCANCORR(number )
prints the partial canonical correlations of the process at lag m and the test for testing ˆm =0 for m > p up to the lag number. The number should be greater than zero. The lag m partial canonical correlations are the canonical correlations between yt and yt m , after adjustment for the dependence of these variables on the intervening values yt 1 , . . . , yt mC1 . See the section “Tentative Order Selection” on page 2127 for details. PCORR PCORR(number )
prints the partial correlation matrices. The number should be greater than zero. With a VAR process, this option is useful for a tentative order selection by the same property as the partial autoregression coefficient matrices, as described in the PRINT=(PARCOEF) option. See the section “Tentative Order Selection” on page 2127 for details. ROOTS
prints the eigenvalues of the kp kp companion matrix associated with the AR characteristic function ˆ.B/, where k is the number of dependent (endogenous) variables, and ˆ.B/ is the finite order matrix polynomial in the backshift operator B, such that B i yt D yt i . These eigenvalues indicate the stationary condition of the process since the stationary condition on the roots of jˆ.B/j D 0 in the VAR(p) model is equivalent to the condition in the corresponding VAR(1) representation that all eigenvalues of the companion matrix be less than one in absolute value. Similarly, you can use this option to check the invertibility of the MA process. In
2092 F Chapter 32: The VARMAX Procedure
addition, when the GARCH statement is specified, this option prints the roots of the GARCH characteristic polynomials to check covariance stationarity for the GARCH process. YW
prints Yule-Walker estimates of the preliminary autoregressive model for the dependent (endogenous) variables. The coefficient matrices are printed using the maximum order of the autoregressive process. Some examples of the PRINT= option are as follows: model model model model
y1 y1 y1 y1
y2 y2 y2 y2
/ / / /
p=1 p=1 p=1 p=1
print=(covy(10) corry(10)); print=(parcoef pcancorr pcorr); print=(impulse(8) decompose(6) covpe(6)); print=(dynamic roots yw);
Lag Specification Options P=number P=(number-list)
specifies the order of the vector autoregressive process. Subset models of vector autoregressive orders can be specified by listing the desired set of lags. For example, you can specify the P=(1,3,4) option. The P=3 option is equivalent to the P=(1,2,3) option. The default is P=0. If P=0 and there are no exogenous (independent) variables, then the AR polynomial order is automatically determined by minimizing an information criterion. If P=0 and the PRIOR= or ECM= option or both are specified, then the AR polynomial order is determined automatically. If the ECM= option is specified, then subset models of vector autoregressive orders are not allowed and the AR maximum order specified is used. Examples illustrating the P= option follow: model y1 y2 / p=3; model y1 y2 / p=(1,3); model y1 y2 / p=(1,3) prior;
Q=number Q=(number-list)
specifies the order of the moving-average error process. Subset models of moving-average orders can be specified by listing the desired set of lags. For example, you can specify the Q=(1,5) option. The default is Q=0. model y1 y2 / p=1 q=1; model y1 y2 / q=(2);
MODEL Statement F 2093
XLAG=number XLAG=(number-list)
specifies the lags of exogenous (independent) variables. Subset models of distributed lags can be specified by listing the desired set of lags. For example, XLAG=(2) selects only a lag 2 of the exogenous variables. The default is XLAG=0. To exclude the present values of exogenous variables from the model, the NOCURRENTX option must be used. model y1 y2 = x1-x3 / xlag=2 nocurrentx; model y1 y2 = x1-x3 / p=1 xlag=(2);
Tentative Order Selection Options MINIC MINIC=(TYPE=value P=number Q=number PERROR=number )
prints the information criterion for the appropriate AR and MA tentative order selection and for the diagnostic checks of the fitted model. If the MINIC= option is not specified, all types of information criteria are printed for diagnostic checks of the fitted model. The following options can be used in the MINIC=( ) option. The options are specified within parentheses. P=number P=(pmi n :pmax )
specifies the range of AR orders to be considered in the tentative order selection. The default is P=(0:5). The P=3 option is equivalent to the P=(0:3) option. PERROR=number PERROR=(p;mi n :p;max )
specifies the range of AR orders for obtaining the error series. The default is PERROR=(pmax W pmax C qmax ). Q=number Q=(qmi n :qmax )
specifies the range of MA orders to be considered in the tentative order selection. The default is Q=(0:5). TYPE=value
specifies the criterion for the model order selection. Valid criteria are as follows: AIC
specifies the Akaike information criterion.
AICC
specifies the corrected Akaike information criterion. This is the default criterion.
FPE
specifies the final prediction error criterion.
HQC
specifies the Hanna-Quinn criterion.
2094 F Chapter 32: The VARMAX Procedure
SBC
specifies the Schwarz Bayesian criterion. You can also specify this value as TYPE=BIC.
model y1 y2 / minic; model y1 y2 / minic=(type=aic p=5);
Cointegration Related Options Two options are related to integrated time series; one is the DFTEST option to test for a unit root and the other is the COINTTEST option to test for cointegration. DFTEST DFTEST=(DLAG=number ) DFTEST=(DLAG=(number ) . . . (number ) )
prints the Dickey-Fuller unit root tests. The DLAG=(number) . . . (number) option specifies the regular or seasonal unit root test. Supported values of number are in 1, 2, 4, 12. If the number is greater than one, a seasonal Dickey-Fuller test is performed. If the TREND= option is specified, the seasonal unit root test is not available. The default is DLAG=1. For example, the DFTEST=(DLAG=(1)(12)) option produces two tables: the Dickey-Fuller regular unit root test and the seasonal unit root test. Some examples of the DFTEST= option follow: model model model model
y1 y1 y1 y1
y2 y2 y2 y2
/ / / /
p=2 p=2 p=2 p=2
dftest; dftest=(dlag=4); dftest=(dlag=(1)(12)); dftest cointtest;
COINTTEST COINTTEST=(JOHANSEN < (=options) > SW < (=options) > SIGLEVEL=number )
The following options can be used with the COINTTEST=( ) option. The options are specified within parentheses. JOHANSEN JOHANSEN=(TYPE=value IORDER=number NORMALIZE=variable)
prints the cointegration rank test for multivariate time series based on Johansen’s method. This test is provided when the number of dependent (endogenous) variables is less than or equal to 11. See the section “Vector Error Correction Modeling” on page 2153 for details. The VARX(p,s) model can be written as the error correction model yt D …yt
1C
p X1 i D1
ˆi yt
i
C ADt C
s X
‚i xt
i
C t
i D0
where …, ˆi , A, and ‚i are coefficient parameters; Dt is a deterministic term such as a constant, a linear trend, or seasonal dummies.
MODEL Statement F 2095
The I.1/ model is defined by one reduced-rank condition. If the cointegration rank is r < k, then there exist k r matrices ˛ and ˇ of rank r such that … D ˛ˇ 0 . The I.1/ model is rewritten as the I.2/ model 2
yt D …yt
1
‰yt
1
C
p X2
2
‰i yt
i
C ADt C
i D1
where ‰ D Ik
Pp
1 i D1 ˆi
and ‰i D
s X
‚i xt
i
C t
i D0
Pp
1 j Di C1 ˆi .
The I.2/ model is defined by two reduced-rank conditions. One is that … D ˛ˇ 0 , where ˛ and ˇ are k r matrices of full-rank r. The other is that ˛0? ‰ˇ? D 0 where and are .k r/ s matrices with s k r; ˛? and ˇ? are k .k r/ matrices of full-rank k r such that ˛0 ˛? D 0 and ˇ 0 ˇ? D 0. The following options can be used in the JOHANSEN=( ) option. The options are specified within parentheses. IORDER=number
specifies the integrated order. IORDER=1
prints the cointegration rank test for an integrated order 1 and prints the long-run parameter, ˇ, and the adjustment coefficient, ˛. This is the default. If the IORDER=1 option is specified, then the AR order should be greater than or equal to 1. When the P=0 option, the value of P is set to 1 for the Johansen test.
IORDER=2
prints the cointegration rank test for integrated orders 1 and 2. If the IORDER=2 option is specified, then the AR order should be greater than or equal to 2. If the P=1 option with the IORDER=2 option, then the value of IORDER is set to 1; if the P=0 option with the IORDER=2 option, then the value of P is set to 2.
NORMALIZE=variable specifies the dependent (endogenous) variable name whose cointegration vectors are to be normalized. If the normalized variable is different from that specified in the ECM= option or the COINTEG statement, then the value specified in the COINTEG statement is used. TYPE=value
specifies the type of cointegration rank test to be printed. Valid values are as follows: MAX
prints the cointegration maximum eigenvalue test.
TRACE
prints the cointegration trace test. This is the default.
If the NOINT option is not specified, the procedure prints two different cointegration rank tests in the presence of the unrestricted and restricted deterministic terms (constant or linear trend) models. If the IORDER=2 option is specified, the procedure automatically determines that the TYPE=TRACE option. Some examples illustrating the COINTTEST= option follow:
2096 F Chapter 32: The VARMAX Procedure
model y1 y2 / p=2 cointtest=(johansen=(type=max normalize=y1)); model y1 y2 / p=2 cointtest=(johansen=(iorder=2 normalize=y1));
SIGLEVEL=value
sets the size of cointegration rank tests and common trends tests. The SIGLEVEL=value can be set to 0.1, 0.05, or 0.01. The default is SIGLEVEL=0.05. model y1 y2 / p=2 cointtest=(johansen siglevel=0.1); model y1 y2 / p=2 cointtest=(sw siglevel=0.1);
SW SW=(TYPE=value LAG=number )
prints common trends tests for a multivariate time series based on the Stock-Watson method. This test is provided when the number of dependent (endogenous) variables is less than or equal to 6. See the section “Common Trends” on page 2150 for details. The following options can be used in the SW=( ) option. The options are listed within parentheses. LAG=number
specifies the number of lags. The default is LAG=max(1,p) for the TYPE=FILTDIF or TYPE=FILTRES option, where p is the AR maximum order specified by the P= option; LAG=T 1=4 for the TYPE=KERNEL option, where T is the number of nonmissing observations. If the specified LAG=number exceeds the default, then it is replaced by the default.
TYPE=value
specifies the type of common trends test to be printed. Valid values are as follows: FILTDIF
prints the common trends test based on the filtering method applied to the differenced series. This is the default.
FILTRES
prints the common trends test based on the filtering method applied to the residual series.
KERNEL
prints the common trends test based on the kernel method.
model y1 y2 / p=2 cointtest=(sw); model y1 y2 / p=2 cointtest=(sw=(type=kernel)); model y1 y2 / p=2 cointtest=(sw=(type=kernel lag=3));
Bayesian VARX Estimation Options PRIOR PRIOR=(prior-options)
specifies the prior value of parameters for the BVARX(p, s) model. The BVARX model allows for a subset model specification. If the ECM= option is specified with the PRIOR option, the BVECMX(p, s) form is fitted. To compute the standard errors of the forecasts, a bootstrap
MODEL Statement F 2097
procedure is used. See the section “Bayesian VAR and VARX Modeling” on page 2139 for details. The following options can be used with the PRIOR=(prior-options) option. The prior-options are listed within parentheses. IVAR IVAR=(variables)
specifies an integrated BVAR(p) model. The variables should be specified in the MODEL statement as dependent variables. If you use the IVAR option without variables, then it sets the overall prior mean of the first lag of each variable equal to one in its own equation and sets all other coefficients to zero. If variables are specified, it sets the prior mean of the first lag of the specified variables equal to one in its own equation and sets all other coefficients to zero. When the series yt D .y1 ; y2 /0 follows a bivariate BVAR(2) process, the IVAR or IVAR=(y1 y2 ) option is equivalent to specifying MEAN=(1 0 0 0 0 1 0 0). If the PRIOR=(MEAN=) or ECM= option is specified, the IVAR= option is ignored. LAMBDA=value
specifies the prior standard deviation of the AR coefficient parameter matrices. It should be a positive number. The default is LAMBDA=1. As the value of the LAMBDA= option is increased, the BVAR(p) model becomes closer to a VAR(p) model. MEAN=(vector )
specifies the mean vector in the prior distribution for the AR coefficients. If the vector is not specified, the prior value is assumed to be a zero vector. See the section “Bayesian VAR and VARX Modeling” on page 2139 for details. You can specify the mean vector by order of the equation. Let .ı; ˆ1 ; : : : ; ˆp / be the parameter sets to be estimated and ˆ D .ˆ1 ; : : : ; ˆp / be the AR parameter sets. The mean vector is specified by row-wise from ˆ; that is, the MEAN=(vec.ˆ0 /) option. For the PRIOR=(mean) option in the BVAR(2), 1;11 1;12 2;11 2;12 2 0:1 1 ˆD D 1;21 1;22 2;21 2;22 0:5 3 0
0 1
where l;ij is an element of ˆ, l is a lag, i is associated with the first dependent variable, and j is associated with the second dependent variable. model y1 y2 / p=2 prior=(mean=(2 0.1 1 0 0.5 3 0 -1));
The deterministic terms and exogenous variables are considered to shrink toward zero; you must omit prior means of exogenous variables and deterministic terms such as a constant, seasonal dummies, or trends. For a Bayesian error correction model estimated when both the ECM= and PRIOR= options are used, a mean vector for only lagged AR coefficients, ˆi , in terms of regressors yt i , for
2098 F Chapter 32: The VARMAX Procedure
i D 1; : : : ; .p 1/ is used in the VECM(p) representation. The diffused prior variance of ˛ is used, since ˇ is replaced by ˇO estimated in a nonconstrained VECM(p) form. yt D ˛zt
1
C
p X1
ˆi yt
i
C ADt C
i D1
s X
‚i xt
i
C t
iD0
where zt D ˇ 0 yt . For example, in the case of a bivariate (k D 2) BVECM(2) form, the option MEAN D .1;11 1;12 1;21 1;22 / where 1;ij is the .i; j /th element of the matrix ˆ1 .
NREP=number
specifies the number of periods to compute the measure of forecast accuracy. The default is NREP=0:5T , where T is the number of observations. THETA=value
specifies the prior standard deviation of the AR coefficient parameter matrices. The value is in the interval (0,1). The default is THETA=0.1. As the value of the THETA= option approaches 1, the specified BVAR(p) model approaches a VAR(p) model. Some examples of the PRIOR= option follow: model y1 y2 / p=2 prior; model y1 y2 / p=2 prior=(theta=0.2 lambda=5); model y1 y2 = x1 / p=2 prior=(theta=0.2 lambda=5); model y1 y2 = x1 / p=2 prior=(theta=0.2 lambda=5 mean=(2 0.1 1 0 0.5 3 0 -1));
See the section “Bayesian VAR and VARX Modeling” on page 2139 for details.
Vector Error Correction Model Options ECM=(RANK=number NORMALIZE= emphvariable ECTREND )
specifies a vector error correction model. The following options can be used in the ECM=( ) option. The options are specified within parentheses. NORMALIZE=variable
specifies a single dependent variable name whose cointegrating vectors are normalized. If the variable name is different from that specified in the COINTEG statement, then the value specified in the COINTEG statement is used. RANK=number
specifies the cointegration rank. This option is required in the ECM= option. The value of the RANK= option should be greater than zero and less than or equal to the number of dependent (endogenous) variables, k. If the rank is different from that specified in the COINTEG statement, then the value specified in the COINTEG statement is used.
GARCH Statement F 2099
ECTREND
specifies the restriction on the drift in the VECM(p) form.
There is no separate drift in the VECM(p) form, but a constant enters only through the error correction term. yt D ˛.ˇ 0 ; ˇ0 /.y0t
0 1 ; 1/ C
p X1
ˆi yt
i
C t
i D1
An example of the ECTREND option follows: model y1 y2 / p=2 ecm=(rank=1 ectrend);
There is a separate drift and no separate linear trend in the VECM(p) form, but a linear trend enters only through the error correction term. yt D ˛.ˇ 0 ; ˇ1 /.y0t
0 1 ; t/ C
p X1
ˆi yt
i
C ı0 C t
i D1
An example of the ECTREND option with the TREND= option follows: model y1 y2 / p=2 ecm=(rank=1 ectrend) trend=linear;
If the NSEASON option is specified, then the NSEASON option is ignored; if the NOINT option is specified, then the ECTREND option is ignored. Some examples of the ECM= option follow: model y1 y2 / p=2 ecm=(rank=1 normalized=y1); model y1 y2 / p=2 ecm=(rank=1 ectrend) trend=linear;
See the section “Vector Error Correction Modeling” on page 2153 for details.
GARCH Statement GARCH options ;
The GARCH statement specifies a GARCH-type multivariate conditional heteroscedasticity model. The following options can be used in the GARCH statement. FORM=value
specifies the representation for a GARCH model. Valid values are as follows: BEKK
specifies a BEKK representation. This is the default.
CCC
specifies a constant conditional correlation representation.
OUTHT=SAS-data-set
writes the conditional covariance matrix to an output data set.
2100 F Chapter 32: The VARMAX Procedure
P=number P=(number-list)
specifies the order of the process or the subset of GARCH terms to be fitted. For example, you can specify the P=(1,3) option. The P=3 option is equivalent to the P=(1,2,3) option. The default is P=0. Q=number Q=(number-list)
specifies the order of the process or the subset of ARCH terms to be fitted. This option is required in the GARCH statement. For example, you can specify the Q=(2) option. The Q=2 option is equivalent to the Q=(1,2) option. For the VAR(1)–ARCH(1) model, model y1 y2 / p=1; garch q=1 form=bekk;
For the multivariate GARCH(1,1) model, model y1 y2; garch q=1 p=1 form=ccc;
Other multivariate GARCH-type models are model y1 y2 = x1 / xlag=1; garch q=1; model y1 y2 / q=1; garch q=1 p=1;
See the section “Multivariate GARCH Modeling” on page 2172 for details.
NLOPTIONS Statement NLOPTIONS options ;
The VARMAX procedure uses the nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks. For a list of all the options of the NLOPTIONS statement, see Chapter 6, “Nonlinear Optimization Methods.” An example of the NLOPTIONS statement follows: proc varmax data=one; nloptions tech=qn; model y1 y2 / p=2; run;
OUTPUT Statement F 2101
The VARMAX procedure uses the dual quasi-Newton optimization method by default when no NLOPTIONS statement is specified. However, it uses Newton-Raphson ridge optimization when the NLOPTIONS statement is specified. The following example uses the TECH=QUANEW by default. proc varmax data=one; model y1 y2 / p=2 method=ml; run;
The next example uses the TECH=NRRIDG by default. proc varmax data=one; nloptions maxiter=500 maxfunc=5000; model y1 y2 / p=2 method=ml; run;
OUTPUT Statement OUTPUT < options > ;
The OUTPUT statement generates and prints forecasts based on the model estimated in the previous MODEL statement and, optionally, creates an output SAS data set that contains these forecasts. When the GARCH model is estimated, the upper and lower confidence limits of forecasts are calculated by assuming that the error covariance has homoscedastic conditional covariance. ALPHA=number
sets the forecast confidence limit size, where number is between 0 and 1. When you specify the ALPHA=number option, the upper and lower confidence limits define the 100(1 ˛)% confidence interval. The default is ALPHA=0.05, which produces 95% confidence intervals. BACK=number
specifies the number of observations before the end of the data at which the multistep forecasts begin. The BACK= option value must be less than or equal to the number of observations minus the number of lagged regressors in the model. The default is BACK=0, which means that the forecasts start at the end of the available data. LEAD=number
specifies the number of multistep forecast values to compute. The default is LEAD=12. NOPRINT
suppresses the printed forecast values of each dependent (endogenous) variable. OUT=SAS-data-set
writes the forecast values to an output data set. Some examples of the OUTPUT statements follow:
2102 F Chapter 32: The VARMAX Procedure
proc varmax data=one; model y1 y2 / p=2; output lead=6 back=2; run; proc varmax data=one; model y1 y2 / p=2; output out=for noprint; run;
RESTRICT Statement RESTRICT restriction, . . . , restriction ;
The RESTRICT statement restricts the specified parameters to the specified values. Only one RESTRICT statement is allowed, but multiple restrictions can be specified in one RESTRICT statement. The restriction’s form is parameter=value and each restriction is separated by commas. Parameters are referred by the following keywords: CONST(i ) is the intercept parameter of the i th time series yi t AR(l; i; j ) is the autoregressive parameter of the lag l value of the j th dependent (endogenous) variable, yj;t l , to the i th dependent variable at time t, yi t MA(l; i; j ) is the moving-average parameter of the lag l value of the j th error process, j;t to the i th dependent variable at time t , yi t
l,
XL(l; i; j ) is the exogenous parameter of the lag l value of the j th exogenous (independent) variable, xj;t l , to the i th dependent variable at time t, yi t SDUMMY(i; j ) is the j th seasonal dummy of the i th time series at time t, yi t , where j D 1; : : : ; .nseason 1/, where nseason is based on the NSEASON= option in the MODEL statement LTREND(i) is the linear trend parameter of the current value i th time series yi t QTREND(i ) is the quadratic trend parameter of the current value i th time series yi t The following keywords are for the fitted GARCH model. The indexes i and j refer to the position of the element in the coefficient matrix. GCHC(i ,j ) is the constant parameter of the covariance matrix, Ht , and (i,j ) is 1 i D j k for CCC representation and 1 i j k for BEKK representations, where k is the number of dependent variables ACH(l,i ,j ) is the ARCH parameter of the lag l value of t t0 , where i; j D 1; : : : ; k for BEKK representation and i D j D 1; : : : ; k for CCC representation
TEST Statement F 2103
GCH(l,i ,j ) is the GARCH parameter of the lag l value of covariance matrix, Ht , where i; j D 1; : : : ; k for BEKK representation and i D j D 1; : : : ; k for CCC representation CCC(i,j ) is the constant conditional correlation parameter for only the CCC representation; (i,j ) is 1 i < j k To use the RESTRICT statement, you need to know the form of the model. If the P=, Q=, and XLAG= options are not specified, then the RESTRICT statement is not applicable. Restricted parameter estimates are computed by introducing a Lagrangian parameter for each restriction (Pringle and Rayner 1971). The Lagrangian parameter measures the sensitivity of the sum of square errors to the restriction. The estimates of these Lagrangian parameters and their significance are printed in the restriction results table. The following are examples of the RESTRICT statement. The first example shows a bivariate (k=2) VAR(2) model, proc varmax data=one; model y1 y2 / p=2; restrict AR(1,1,2)=0, AR(2,1,2)=0.3; run;
The AR(1,1,2) and AR(2,1,2) parameters are fixed as AR(1,1,2)=0 and AR(2,1,2)=0.3, respectively, and other parameters are to be estimated. The following shows a bivariate (k=2) VARX(1,1) model with three exogenous variables, proc varmax data=two; model y1 = x1 x2, y2 = x2 x3 / p=1 xlag=1; restrict XL(0,1,1)=-1.2, XL(1,2,3)=0; run;
The XL(0,1,1) and XL(1,2,3) parameters are fixed as XL(0,1,1)=–1.2 and XL(1,2,3)=0, respectively, and other parameters are to be estimated.
TEST Statement TEST restriction, . . . , restriction ;
The TEST statement performs the Wald test for the joint hypothesis specified in the statement. The restriction’s form is parameter=value, and each restriction is separated by commas. The restrictions are specified in the same manner as in the RESTRICT statement. See the RESTRICT statement for description of model parameter naming conventions used by the RESTRICT and TEST statements. Any number of TEST statements can be specified. To use the TEST statement, you need to know the form of the model. If the P=, Q=, and XLAG= options are not specified, then the TEST statement is not applicable.
2104 F Chapter 32: The VARMAX Procedure
See the section “Granger Causality Test” on page 2136 for the Wald test. The following is an example of the TEST statement. In the case of a bivariate (k=2) VAR(2) model, proc varmax data=one; model y1 y2 / p=2; test AR(1,1,2)=0, AR(2,1,2)=0; run;
After estimating the parameters, the TEST statement tests the null hypothesis that AR(1,1,2)=0 and AR(2,1,2)=0.
Details: VARMAX Procedure
Missing Values The VARMAX procedure currently does not support missing values. The procedure uses the first contiguous group of observations with no missing values for any of the MODEL statement variables. Observations at the beginning of the data set with missing values for any MODEL statement variables are not used or included in the output data set. At the end of the data set, observations can have dependent (endogenous) variables with missing values and independent (exogenous) variables with nonmissing values.
VARMAX Model The vector autoregressive moving-average model with exogenous variables is called the VARMAX(p,q,s) model. The form of the model can be written as
yt D
p X i D1
ˆi yt
i
C
s X
‚i xt i
C t
i D0
q X
‚i t
i
i D1
where the output variables of interest, yt D .y1t ; : : : ; yk t /0 , can be influenced by other input variables, xt D .x1t ; : : : ; xrt /0 , which are determined outside of the system of interest. The variables yt are referred to as dependent, response, or endogenous variables, and the variables xt are referred to as independent, input, predictor, regressor, or exogenous variables. The unobserved noise variables, t D .1t ; : : : ; kt /0 , are a vector white noise process. The VARMAX(p,q,s) model can be written ˆ.B/yt
D ‚ .B/xt C ‚.B/t
VARMAX Model F 2105
where ˆ.B/ D Ik
ˆ1 B
ˆp B p
‚ .B/ D ‚0 C ‚1 B C C ‚s B s ‚.B/ D Ik
‚1 B
‚q B q
are matrix polynomials in B in the backshift operator, such that B i yt D yt k k matrices, and the ‚i are k r matrices.
i,
the ˆi and ‚i are
The following assumptions are made: E.t / D 0, E.t t0 / D †, which is positive-definite, and E.t s0 / D 0 for t ¤ s. For stationarity and invertibility of the VARMAX process, the roots of jˆ.z/j D 0 and j‚.z/j D 0 are outside the unit circle. The exogenous (independent) variables xt are not correlated with residuals t , E.xt t0 / D 0. The exogenous variables can be stochastic or nonstochastic. When the exogenous variables are stochastic and their future values are unknown, forecasts of these future values are needed to forecast the future values of the endogenous (dependent) variables. On occasion, future values of the exogenous variables can be assumed to be known because they are deterministic variables. The VARMAX procedure assumes that the exogenous variables are nonstochastic if future values are available in the input data set. Otherwise, the exogenous variables are assumed to be stochastic and their future values are forecasted by assuming that they follow the VARMA(p,q) model, prior to forecasting the endogenous variables, where p and q are the same as in the VARMAX(p,q,s) model.
State-Space Representation Another representation of the VARMAX(p,q,s) model is in the form of a state-variable or a statespace model, which consists of a state equation zt D F zt
1
C Kxt C Gt
and an observation equation yt D H zt where 3 2 3 Ik ‚0 60kr 7 60kk 7 yt 6 7 7 6 6 : 7 6 : 7 6 : 7 6 :: 7 6 :: 7 6 :: 7 7 6 7 6 6 7 6y 60 7 7 60 7 6 kk 7 6 kr 7 6 t pC1 7 60 7 7 6 I 7 6 x t 6 rk 7 6 r 7 6 7 6 : 7 7 60 6 : 7 7 6 6 rr 7 7 zt D 6 6 :: 7 ; K D 6 : 7 ; G D 6 :: 7 6 6 : 7 7 6 7 6 0rk 7 6 : 7 6 xt sC1 7 7 6 6 7 7 6 6Ikk 7 6 0rr 7 6 t 7 7 6 6 7 7 6 60kr 7 6 :: 7 60kk 7 6 7 7 6 4 : 5 6 :: 7 6 :: 7 4 : 5 4 : 5 t qC1 0kk 0kr 2
2
3
2106 F Chapter 32: The VARMAX Procedure
2 ˆ1 6 Ik 6 6 : 6 :: 6 60 6 60 6 60 6 F D6 6 :: 6 : 6 60 6 60 6 60 6 6 :: 4 : 0
:: : :: : :: :
ˆp 0 :: : Ik 0 0 :: : 0 0 0 :: : 0
1
‚1 0 :: : 0 0 Ir :: : 0 0 0 :: : 0
ˆp 0 :: : 0 0 0 :: : 0 0 0 :: : 0
:: : :: : :: :
‚s 1 0 :: : 0 0 0 :: : Ir 0 0 :: : 0
‚s 0 :: : 0 0 0 :: : 0 0 0 :: : 0
‚1 0 :: : 0 0 0 :: : 0 0 Ik :: : 0
:: : :: : :: :
‚q 0 :: : 0 0 0 :: : 0 0 0 :: : Ik
1
3 ‚q 0 7 7 :: 7 : 7 7 0 7 7 0 7 7 0 7 7 :: 7 7 : 7 7 0 7 7 0 7 7 0 7 7 :: 7 : 5 0
and H D ŒIk ; 0kk ; : : : ; 0kk ; 0kr ; : : : ; 0kr ; 0kk ; : : : ; 0kk On the other hand, it is assumed that xt follows a VARMA(p,q) model xt D
p X
Ai xt
i
C at
i D1
q X
Ci a t
i
i D1
The model can also be expressed as A.B/xt D C.B/at where A.B/ D Ir A1 B Ap B p and C.B/ D Ir C1 B Cq B q are matrix polynomials in B, and the Ai and Ci are r r matrices. Without loss of generality, the AR and MA orders can be taken to be the same as the VARMAX(p,q,s) model, and at and t are independent white noise processes. Under suitable conditions such as stationarity, xt is represented by an infinite order moving-average process
xt D A.B/
1
x
C.B/at D ‰ .B/at D
1 X
‰jx at
j
j D0
where ‰ x .B/ D A.B/
1 C.B/
D
P1
x j j D0 ‰j B .
The optimal minimum mean squared error (minimum MSE) i -step-ahead forecast of xt Ci is xt Ci jt
D
1 X
‰jx at Ci
j
j Di
xt Ci jtC1 D xt Cijt C ‰ix 1 at C1 For i > q, xt Ci jt D
p X j D1
Aj xt Ci
j jt
VARMAX Model F 2107
The VARMAX(p,q,s) model has an absolutely convergent representation as yt
1
D ˆ.B/
x
1
‚ .B/xt C ˆ.B/ 1
D ‰ .B/‰ .B/at C ˆ.B/
‚.B/t
‚.B/t
D V .B/at C ‰.B/t or 1 X
yt D
Vj at
j
C
j D0
1 X
‰ j t
j
j D0
P1
where ‰.B/ D ˆ.B/ 1 ‚.B/ D P j ‰ .B/‰ x .B/ D 1 j D0 Vj B .
j D0 ‰j B
j,
‰ .B/ D ˆ.B/
1 ‚ .B/,
The optimal (minimum MSE) i -step-ahead forecast of ytCi is
D
yt Cijt
1 X
Vj atCi
j
C
j Di
1 X
‰j t Ci
j
j Di
yt Ci jt C1 D yt Ci jt C Vi
1 at C1
C ‰i
1 t C1
for i D 1; : : : ; v with v D max.p; q C 1/. For i > q,
yt Ci jt
D
D
D
D
p X j D1 p X j D1 p X j D1 p X
ˆj yt Ci
j jt
s X
C
‚j xt Ci
j jt
j D0
ˆj yt Ci
j jt
C
‚0 xtCi jt
C
s X
‚j xt Ci
j jt
j D1
ˆj yt Ci
j jt
p X
C ‚0
Aj xt Ci
j jt
j D1
ˆj yt Ci
j jt
j D1
u X
C
s X
C
‚j xt Ci
j jt
j D1
.‚0 Aj C ‚j /xt Ci
j jt
j D1
where u D max.p; s/. Define …j D ‚0 Aj C ‚j . For i D v > q with v D max.p; q C 1/, you obtain
yt Cvjt yt Cvjt
D
D
p X j D1 p X j D1
ˆj yt Cv
j jt
C
u X
…j xt Cv
j jt
for u v
…j xt Cv
j jt
for u > v
j D1
ˆj yt Cv
j jt
C
r X j D1
and V .B/ D
2108 F Chapter 32: The VARMAX Procedure
From the preceding relations, a state equation is zt C1 D F zt C Kxt C Get C1 and an observation equation is yt D H zt where 2
yt
3
6 ytC1jt 7 6 7 2 3 6 7 :: xt Cv u 6 7 : 6 7 6xt Cv uC1 7 6ytCv 1jt 7 a 6 7 t C1 7 zt D 6 7 ; etC1 D :: 6 xt 7 ; xt D 6 t C1 4 5 : 6 7 6 x 7 xt 1 6 tC1jt 7 6 7 :: 4 5 : xtCv 1jt 2 0 Ik 0 0 0 0 0 60 0 I 0 0 0 0 k 6 6 :: :: :: : : : :: : : : :: :: :: 6 : : : : 6 6ˆv ˆv 1 ˆv 2 ˆ1 …v …v 1 …v 2 F D6 60 0 0 0 0 Ir 0 6 60 0 0 0 0 0 I r 6 6 :: :: :: :: :: :: :: : : 4 : : : : : : : : 0 0 0 0 Av Av 1 Av 2 2 3 2 3 V0 Ik 0 0 0 6 V1 ‰1 7 6 0 6 7 0 0 7 6 7 6 :: :: 7 6 :: 7 6 :: : : : 7 :: :: 7 6 : 6 : 7 : 7 6 Vv 1 ‰v 1 7 6 6 7 6 7 K D 6…u …u 1 …vC1 7 ; G D 6 Ir 0rk 7 6 0 7 6 7 0 0 7 6 6 ‰x 7 0 rk 6 :: 7 6 7 1 :: :: :: 6 7 4 : 5 : : : : : : : 4 : : 5 0 0 0 ‰vx 1 0rk
:: : :: :
0 0 :: :
3
7 7 7 7 7 …1 7 7 0 7 7 0 7 7 :: 7 : 5 A1
and H D ŒIk ; 0kk ; : : : ; 0kk ; 0kr ; : : : ; 0kr Note that the matrix K and the input vector xt are defined only when u > v.
Dynamic Simultaneous Equations Modeling In the econometrics literature, the VARMAX(p,q,s) model is sometimes written in a form that is slightly different than the one shown in the previous section. This alternative form is referred to as a dynamic simultaneous equations model or a dynamic structural equations model.
Dynamic Simultaneous Equations Modeling F 2109
Since E.t t0 / D † is assumed to be positive-definite, there exists a lower triangular matrix A0 with ones on the diagonals such that A0 †A00 D †d , where †d is a diagonal matrix with positive diagonal elements.
A0 yt D
p X
Ai yt
i
C
i D1
s X
Ci xt
i
C C0 t
i D0
q X
Ci t
i
i D1
where Ai D A0 ˆi , Ci D A0 ‚i , C0 D A0 , and Ci D A0 ‚i . As an alternative form,
A0 yt D
p X i D1
Ai yt
i
C
s X i D0
Ci xt i
C at
q X
Ci at
i
i D1
where Ai D A0 ˆi , Ci D A0 ‚i , Ci D A0 ‚i A0 1 , and at D C0 t has a diagonal covariance matrix †d . The PRINT=(DYNAMIC) option returns the parameter estimates that result from estimating the model in this form. A dynamic simultaneous equations model involves a leading (lower triangular) coefficient matrix for yt at lag 0 or a leading coefficient matrix for t at lag 0. Such a representation of the VARMAX(p,q,s) model can be more useful in certain circumstances than the standard representation. From the linear combination of the dependent variables obtained by A0 yt , you can easily see the relationship between the dependent variables in the current time. The following statements provide the dynamic simultaneous equations of the VAR(1) model. proc iml; sig = {1.0 0.5, 0.5 1.25}; phi = {1.2 -0.5, 0.6 0.3}; /* simulate the vector time series */ call varmasim(y,phi) sigma = sig n = 100 seed = 34657; cn = {'y1' 'y2'}; create simul1 from y[colname=cn]; append from y; quit; data simul1; set simul1; date = intnx( 'year', '01jan1900'd, _n_-1 ); format date year4.; run; proc varmax data=simul1; model y1 y2 / p=1 noint print=(dynamic); run;
This is the same data set and model used in the section “Getting Started: VARMAX Procedure” on page 2050. You can compare the results of the VARMA model form and the dynamic simultaneous equations model form.
2110 F Chapter 32: The VARMAX Procedure
Figure 32.25 Dynamic Simultaneous Equations (DYNAMIC Option) The VARMAX Procedure Covariances of Innovations Variable y1 y2
y1
y2
1.28875 0.00000
0.00000 1.29578
AR Lag 0 1
Variable y1 y2 y1 y2
y1
y2
1.00000 -0.30845 1.15977 0.18861
0.00000 1.00000 -0.51058 0.54247
Dynamic Model Parameter Estimates
Equation Parameter
Estimate
y1
1.15977 -0.51058 0.30845 0.18861 0.54247
AR1_1_1 AR1_1_2 AR0_2_1 AR1_2_1 AR1_2_2
y2
Standard Error t Value Pr > |t| Variable 0.05508 0.07140
21.06 -7.15
0.05779 0.07491
3.26 7.24
0.0001 y1(t-1) 0.0001 y2(t-1) y1(t) 0.0015 y1(t-1) 0.0001 y2(t-1)
In Figure 32.4 in the section “Getting Started: VARMAX Procedure” on page 2050, the covariance of t estimated from the VARMAX model form is † D
1:28875 0:39751 0:39751 1:41839
Figure 32.25 shows the results from estimating the model as a dynamic simultaneous equations model. By the decomposition of † , you get a diagonal matrix (†a ) and a lower triangular matrix (A0 ) such as †a D A0 † A00 where †a D
1:28875 0 0 1:29578
and A0 D
1 0 0:30845 1
The lower triangular matrix (A0 ) is shown in the left side of the simultaneous equations model. The parameter estimates in equations system are shown in the right side of the two-equations system.
Impulse Response Function F 2111
The simultaneous equations model is written as
1 0 0:30845 1
yt D
1:15977 0:18861
0:51058 0:54247
yt
1
C at
The resulting two-equation system can be written as y1t
D 1:15977y1;t
y2t
D 0:30845y1t C 0:18861y1;t
1
0:51058y2;t 1
1
C a1t
C 0:54247y2;t
1
C a2t
Impulse Response Function Simple Impulse Response Function (IMPULSE=SIMPLE Option) The VARMAX(p,q,s) model has a convergent representation yt D ‰ .B/xt C ‰.B/t where ‰ .B/ D ˆ.B/
1 ‚ .B/
D
P1
j j D0 ‰j B
and ‰.B/ D ˆ.B/
1 ‚.B/
D
P1
j D0 ‰j B
j.
The elements of the matrices ‰j from the operator ‰.B/, called the impulse response, can be interpreted as the impact that a shock in one variable has on another variable. Let j;i n be the i nt h element of ‰j at lag j , where i is the index for the impulse variable, and n is the index for the response variable (impulse ! response). For instance, j;11 is an impulse response to y1t ! y1t , and j;12 is an impulse response to y1t ! y2t .
Accumulated Impulse Response Function (IMPULSE=ACCUM Option) The accumulated impulse response function is the cumulative sum of the impulse response function, P ‰la D lj D0 ‰j .
Orthogonalized Impulse Response Function (IMPULSE=ORTH Option) The MA representation of a VARMA(p,q) model with a standardized white noise innovation process offers another way to interpret a VARMA(p,q) model. Since † is positive-definite, there is a lower triangular matrix P such that † D PP 0 . The alternate MA representation of a VARMA(p,q) model is written as yt D ‰ o .B/ut P o j o where ‰ o .B/ D 1 j D0 ‰j B , ‰j D ‰j P , and ut D P
1 . t
The elements of the matrices ‰jo , called the orthogonal impulse response, can be interpreted as the effects of the components of the standardized shock process ut on the process yt at lag j .
2112 F Chapter 32: The VARMAX Procedure
Impulse Response of Transfer Function (IMPULSX=SIMPLE Option) The coefficient matrix ‰j from the transfer function operator ‰ .B/ can be interpreted as the effects that changes in the exogenous variables xt have on the output variable yt at lag j ; it is called an impulse response matrix in the transfer function.
Impulse Response of Transfer Function (IMPULSX=ACCUM Option) The accumulated impulse response in the transfer function is the cumulative sum of the impulse P response in the transfer function, ‰la D lj D0 ‰j . The asymptotic distributions of the impulse functions can be seen in the section “VAR and VARX Modeling” on page 2133. The following statements provide the impulse response and the accumulated impulse response in the transfer function for a VARX(1,0) model. proc varmax data=grunfeld plot=impulse; model y1-y3 = x1 x2 / p=1 lagmax=5 printform=univariate print=(impulsx=(all) estimates); run;
Impulse Response Function F 2113
In Figure 32.26, the variables x1 and x2 are impulses and the variables y1, y2, and y3 are responses. You can read the table matching the pairs of impulse ! response such as x1 ! y1, x1 ! y2, x1 ! y3, x2 ! y1, x2 ! y2, and x2 ! y3. In the pair of x1 ! y1, you can see the long-run responses of y1 to an impulse in x1 (the values are 1.69281, 0.35399, 0.09090, and so on for lag 0, lag 1, lag 2, and so on, respectively). Figure 32.26 Impulse Response in Transfer Function (IMPULSX= Option) The VARMAX Procedure Simple Impulse Response of Transfer Function by Variable Variable Response\Impulse y1
y2
y3
Lag
x1
x2
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
1.69281 0.35399 0.09090 0.05136 0.04717 0.04620 -6.09850 -5.15484 -3.04168 -2.23797 -1.98183 -1.87415 -0.02317 1.57476 1.80231 1.77024 1.70435 1.63913
-0.00859 0.01727 0.00714 0.00214 0.00072 0.00040 2.57980 0.45445 0.04391 -0.01376 -0.01647 -0.01453 -0.01274 -0.01435 0.00398 0.01062 0.01197 0.01187
2114 F Chapter 32: The VARMAX Procedure
Figure 32.27 shows the responses of y1, y2, and y3 to a forecast error impulse in x1. Figure 32.27 Plot of Impulse Response in Transfer Function
Impulse Response Function F 2115
Figure 32.28 shows the accumulated impulse response in transfer function. Figure 32.28 Accumulated Impulse Response in Transfer Function (IMPULSX= Option) Accumulated Impulse Response of Transfer Function by Variable Variable Response\Impulse y1
y2
y3
Lag
x1
x2
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
1.69281 2.04680 2.13770 2.18906 2.23623 2.28243 -6.09850 -11.25334 -14.29502 -16.53299 -18.51482 -20.38897 -0.02317 1.55159 3.35390 5.12414 6.82848 8.46762
-0.00859 0.00868 0.01582 0.01796 0.01867 0.01907 2.57980 3.03425 3.07816 3.06440 3.04793 3.03340 -0.01274 -0.02709 -0.02311 -0.01249 -0.00052 0.01135
2116 F Chapter 32: The VARMAX Procedure
Figure 32.29 shows the accumulated responses of y1, y2, and y3 to a forecast error impulse in x1. Figure 32.29 Plot of Accumulated Impulse Response in Transfer Function
The following statements provide the impulse response function, the accumulated impulse response function, and the orthogonalized impulse response function with their standard errors for a VAR(1) model. Parts of the VARMAX procedure output are shown in Figure 32.30, Figure 32.32, and Figure 32.34. proc varmax data=simul1 plot=impulse; model y1 y2 / p=1 noint lagmax=5 print=(impulse=(all)) printform=univariate; run;
Impulse Response Function F 2117
Figure 32.30 is the output in a univariate format associated with the PRINT=(IMPULSE=) option for the impulse response function. The keyword STD stands for the standard errors of the elements. The matrix in terms of the lag 0 does not print since it is the identity. In Figure 32.30, the variables y1 and y2 of the first row are impulses, and the variables y1 and y2 of the first column are responses. You can read the table matching the i mpulse ! response pairs, such as y1 ! y1, y1 ! y2, y2 ! y1, and y2 ! y2. For example, in the pair of y1 ! y1 at lag 3, the response is 0.8055. This represents the impact on y1 of one-unit change in y1 after 3 periods. As the lag gets higher, you can see the long-run responses of y1 to an impulse in itself. Figure 32.30 Impulse Response Function (IMPULSE= Option) The VARMAX Procedure Simple Impulse Response by Variable Variable Response\Impulse y1
y2
Lag
y1
y2
1 STD 2 STD 3 STD 4 STD 5 STD 1 STD 2 STD 3 STD 4 STD 5 STD
1.15977 0.05508 1.06612 0.10450 0.80555 0.14522 0.47097 0.17191 0.14315 0.18214 0.54634 0.05779 0.84396 0.08481 0.90738 0.10307 0.78943 0.12318 0.56123 0.14236
-0.51058 0.05898 -0.78872 0.10702 -0.84798 0.14121 -0.73776 0.15864 -0.52450 0.16115 0.38499 0.06188 -0.13073 0.08556 -0.48124 0.09865 -0.64856 0.11661 -0.65275 0.13482
2118 F Chapter 32: The VARMAX Procedure
Figure 32.31 shows the responses of y1 and y2 to a forecast error impulse in y1 with two standard errors. Figure 32.31 Plot of Impulse Response
Impulse Response Function F 2119
Figure 32.32 is the output in a univariate format associated with the PRINT=(IMPULSE=) option for the accumulated impulse response function. The matrix in terms of the lag 0 does not print since it is the identity. Figure 32.32 Accumulated Impulse Response Function (IMPULSE= Option) Accumulated Impulse Response by Variable Variable Response\Impulse y1
y2
Lag
y1
y2
1 STD 2 STD 3 STD 4 STD 5 STD 1 STD 2 STD 3 STD 4 STD 5 STD
2.15977 0.05508 3.22589 0.21684 4.03144 0.52217 4.50241 0.96922 4.64556 1.51137 0.54634 0.05779 1.39030 0.17614 2.29768 0.36166 3.08711 0.65129 3.64834 1.07510
-0.51058 0.05898 -1.29929 0.22776 -2.14728 0.53649 -2.88504 0.97088 -3.40953 1.47122 1.38499 0.06188 1.25426 0.18392 0.77302 0.36874 0.12447 0.65333 -0.52829 1.06309
2120 F Chapter 32: The VARMAX Procedure
Figure 32.33 shows the accumulated responses of y1 and y2 to a forecast error impulse in y1 with two standard errors. Figure 32.33 Plot of Accumulated Impulse Response
Impulse Response Function F 2121
Figure 32.34 is the output in a univariate format associated with the PRINT=(IMPULSE=) option for the orthogonalized impulse response function. The two right-hand side columns, y1 and y2, represent the y1_i nnovat i on and y2_i nnovation variables. These are the impulses variables. The left-hand side column contains responses variables, y1 and y2. You can read the table by matching the i mpulse ! response pairs such as y1_i nnovation ! y1, y1_i nnovation ! y2, y2_i nnovat i on ! y1, and y2_i nnovation ! y2. Figure 32.34 Orthogonalized Impulse Response Function (IMPULSE= Option) Orthogonalized Impulse Response by Variable Variable Response\Impulse y1
y2
Lag
y1
y2
0 STD 1 STD 2 STD 3 STD 4 STD 5 STD 0 STD 1 STD 2 STD 3 STD 4 STD 5 STD
1.13523 0.08068 1.13783 0.10666 0.93412 0.13113 0.61756 0.15348 0.27633 0.16940 -0.02115 0.17432 0.35016 0.11676 0.75503 0.06949 0.91231 0.10553 0.86158 0.12266 0.66909 0.13305 0.40856 0.14189
0.00000 0.00000 -0.58120 0.14110 -0.89782 0.16776 -0.96528 0.18595 -0.83981 0.19230 -0.59705 0.18830 1.13832 0.08855 0.43824 0.10937 -0.14881 0.13565 -0.54780 0.14825 -0.73827 0.15846 -0.74304 0.16765
In Figure 32.4, there is a positive correlation between "1t and "2t . Therefore, shock in y1 can be accompanied by a shock in y2 in the same period. For example, in the pair of y1_i nnovation ! y2, you can see the long-run responses of y2 to an impulse in y1_i nnovation.
2122 F Chapter 32: The VARMAX Procedure
Figure 32.35 shows the orthogonalized responses of y1 and y2 to a forecast error impulse in y1 with two standard errors. Figure 32.35 Plot of Orthogonalized Impulse Response
Forecasting The optimal (minimum MSE) l-step-ahead forecast of yt Cl is
yt Cljt
yt Cljt
D
D
p X
ˆj yt Cl
j jt
C
s X
j D1
j D0
p X
s X
j D1
ˆj yt Cl
j jt
C
‚j xtCl
j jt
q X
‚j t Cl
j;
l q
j Dl
‚j xt Cl
j jt ;
l >q
j D0
with yt Cl j jt D yt Cl j and xt Cl j jt D xt Cl j for l j . For the forecasts xt Cl section “State-Space Representation” on page 2105.
j jt ,
see the
Forecasting F 2123
Covariance Matrices of Prediction Errors without Exogenous (Independent) Variables Under the stationarity assumption, the optimal (minimum MSE) l-step-ahead forecast of yt Cl has P1 an infinite moving-average form, yt Cljt D j Dl ‰j t Cl j . The prediction error of the optimal P 1 l-step-ahead forecast is et Cljt D yt Cl yt Cljt D lj D0 ‰j t Cl j , with zero mean and covariance matrix, †.l/ D Cov.et Cljt / D
l 1 X
‰j †‰j0 D
j D0
l 1 X
‰jo ‰jo
0
j D0
where ‰jo D ‰j P with a lower triangular matrix P such that † D PP 0 . Under the assumption of normality of the t , the l-step-ahead prediction error et Cljt is also normally distributed as multivariate N.0; †.l//. Hence, it follows that the diagonal elements i2i .l/ of †.l/ can be used, together with the point forecasts yi;t Cljt , to construct l-step-ahead prediction intervals of the future values of the component series, yi;t Cl . The following statements use the COVPE option to compute the covariance matrices of the prediction errors for a VAR(1) model. The parts of the VARMAX procedure output are shown in Figure 32.36 and Figure 32.37. proc varmax data=simul1; model y1 y2 / p=1 noint lagmax=5 printform=both print=(decompose(5) impulse=(all) covpe(5)); run;
Figure 32.36 is the output in a matrix format associated with the COVPE option for the prediction error covariance matrices. Figure 32.36 Covariances of Prediction Errors (COVPE Option) The VARMAX Procedure Prediction Error Covariances Lead 1 2 3 4 5
Variable y1 y2 y1 y2 y1 y2 y1 y2 y1 y2
y1
y2
1.28875 0.39751 2.92119 1.00189 4.59984 1.98771 5.91299 3.04856 6.69463 3.85346
0.39751 1.41839 1.00189 2.18051 1.98771 3.03498 3.04856 4.07738 3.85346 5.07010
Figure 32.37 is the output in a univariate format associated with the COVPE option for the prediction error covariances. This printing format more easily explains the prediction error covariances of each variable.
2124 F Chapter 32: The VARMAX Procedure
Figure 32.37 Covariances of Prediction Errors Prediction Error Covariances by Variable Variable y1
y2
Lead
y1
y2
1 2 3 4 5 1 2 3 4 5
1.28875 2.92119 4.59984 5.91299 6.69463 0.39751 1.00189 1.98771 3.04856 3.85346
0.39751 1.00189 1.98771 3.04856 3.85346 1.41839 2.18051 3.03498 4.07738 5.07010
Covariance Matrices of Prediction Errors in the Presence of Exogenous (Independent) Variables Exogenous variables can be both stochastic and nonstochastic (deterministic) variables. Considering the forecasts in the VARMAX(p,q,s) model, there are two cases. When exogenous (independent) variables are stochastic (future values not specified): As defined in the section “State-Space Representation” on page 2105, yt Cljt has the representation yt Cljt D
1 X
Vj at Cl
j
C
j Dl
1 X
‰j t Cl
j
‰j t Cl
j
j Dl
and hence et Cljt D
l 1 X
Vj at Cl
j
j D0
C
l 1 X j D0
Therefore, the covariance matrix of the l-step-ahead prediction error is given as †.l/ D Cov.et Cljt / D
l 1 X j D0
Vj †a Vj0 C
l 1 X
‰j † ‰j0
j D0
where †a is the covariance of the white noise series at , and at is the white noise series for the VARMA(p,q) model of exogenous (independent) variables, which is assumed not to be correlated with t or its lags.
Forecasting F 2125
When future exogenous (independent) variables are specified: The optimal forecast yt Cljt of yt conditioned on the past information and also on known future values xt C1 ; : : : ; xt Cl can be represented as yt Cljt D
1 X
‰j xtCl
j
1 X
C
j D0
‰j tCl
j
j Dl
and the forecast error is et Cljt D
l 1 X
‰j tCl
j
j D0
Thus, the covariance matrix of the l-step-ahead prediction error is given as †.l/ D Cov.et Cljt / D
l 1 X
‰j † ‰j0
j D0
Decomposition of Prediction Error Covariances Pl 1 o o 0 In the relation †.l/ D j D0 ‰j ‰j , the diagonal elements can be interpreted as providing a decomposition of the l-step-ahead prediction error covariance i2i .l/ for each component series yi t into contributions from the components of the standardized innovations t . If you denote the (i; n)th element of ‰jo by MSE.yi;t Chjt / D E.yi;t Ch
j;i n ,
the MSE of yi;tChjt is
2
l 1 X k X
yi;t Chjt / D
2 j;i n
j D0 nD1
P 1 2 Note that jl D0 j;i n is interpreted as the contribution of innovations in variable n to the prediction error covariance of the l-step-ahead forecast of variable i. The proportion, !l;i n , of the l-step-ahead forecast error covariance of variable i accounting for the innovations in variable n is !l;i n D
l 1 X
2 j;i n =MSE.yi;tChjt /
j D0
The following statements use the DECOMPOSE option to compute the decomposition of prediction error covariances and their proportions for a VAR(1) model: proc varmax data=simul1; model y1 y2 / p=1 noint print=(decompose(15)) printform=univariate; run;
2126 F Chapter 32: The VARMAX Procedure
The proportions of decomposition of prediction error covariances of two variables are given in Figure 32.38. The output explains that about 91.356% of the one-step-ahead prediction error covariances of the variable y2t is accounted for by its own innovations and about 8.644% is accounted for by y1t innovations. Figure 32.38 Decomposition of Prediction Error Covariances (DECOMPOSE Option) Proportions of Prediction Error Covariances by Variable Variable
Lead
y1
y2
1 2 3 4 5 1 2 3 4 5
1.00000 0.88436 0.75132 0.64897 0.58460 0.08644 0.31767 0.50247 0.55607 0.53549
0.00000 0.11564 0.24868 0.35103 0.41540 0.91356 0.68233 0.49753 0.44393 0.46451
y1
y2
Forecasting of the Centered Series If the CENTER option is specified, the sample mean vector is added to the forecast.
Forecasting of the Differenced Series If dependent (endogenous) variables are differenced, the final forecasts and their prediction error covariances are produced by integrating those of the differenced series. However, if the PRIOR option is specified, the forecasts and their prediction error variances of the differenced series are produced. Let zt be the original series with some appended zero values that correspond to the unobserved past observations. Let .B/ be the k k matrix polynomial in the backshift operator that corresponds to the differencing specified by the MODEL statement. The off-diagonal elements of i are zero, and the diagonal elements can be different. Then yt D .B/zt . This gives the relationship zt D
1
.B/yt D
1 X
ƒj yt
j
j D0
where
1 .B/
D
P1
j D0 ƒj B
j
and ƒ0 D Ik .
The l-step-ahead prediction of ztCl is zt Cljt D
l 1 X j D0
ƒj yt Cl
j jt
C
1 X j Dl
ƒj yt Cl
j
Tentative Order Selection F 2127
The l-step-ahead prediction error of zt Cl is l 1 X
ƒj yt Cl
yt Cl
j
j jt
l 1 X
D
0
j X
@
j D0
j D0
1 ƒu ‰ j
u A t Cl j
uD0
Letting †z .0/ D 0, the covariance matrix of the l-step-ahead prediction error of zt Cl , †z .l/, is l 1 X
†z .l/ D
0
j X
@
1 ƒu ‰j
u A †
0 D †z .l
j X
@
10 ƒu ‰j
uA
uD0
uD0
j D0
0
1/ C @
l 1 X
1 ƒj ‰ l
1 j
0
l 1 X
A † @
10 ƒj ‰l
1 j
A
j D0
j D0
If there are stochastic exogenous (independent) variables, the covariance matrix of the l-step-ahead prediction error of zt Cl , †z .l/, is 0 †z .l/ D †z .l
1/ C @
l 1 X
1 ƒj ‰ l
1 j
0
A † @
j D0
0 C@
l 1 X
1 j
j D0
10 ƒj ‰l
1 j
A
j D0
1 ƒj V l
l 1 X
0
A †a @
l 1 X
10 ƒj V l
1 j
A
j D0
Tentative Order Selection Sample Cross-Covariance and Cross-Correlation Matrices Given a stationary multivariate time series yt , cross-covariance matrices are .l/ D EŒ.yt
/.yt Cl
/0
where D E.yt /, and cross-correlation matrices are .l/ D D
1
.l/D
1
where D is a diagonal matrix with the standard deviations of the components of yt on the diagonal. The sample cross-covariance matrix at lag l, denoted as C.l/, is computed as T l 1 X O .l/ D C.l/ D yQ t yQ 0t Cl T t D1
2128 F Chapter 32: The VARMAX Procedure
O where yQ t is the centered data and T is the number of nonmissing observations. Thus, .l/ has .i; j /th element Oij .l/ D cij .l/. The sample cross-correlation matrix at lag l is computed as Oij .l/ D cij .l/=Œci i .0/cjj .0/1=2 ; i; j D 1; : : : ; k The following statements use the CORRY option to compute the sample cross-correlation matrices and their summary indicator plots in terms of C; ; and , where C indicates significant positive cross-correlations, indicates significant negative cross-correlations, and indicates insignificant cross-correlations. proc varmax data=simul1; model y1 y2 / p=1 noint lagmax=3 print=(corry) printform=univariate; run;
Figure 32.39 shows the sample cross-correlation matrices of y1t and y2t . As shown, the sample autocorrelation functions for each variable decay quickly, but are significant with respect to two standard errors. Figure 32.39 Cross-Correlations (CORRY Option) The VARMAX Procedure Cross Correlations of Dependent Series by Variable Variable y1
y2
Lag
y1
y2
0 1 2 3 0 1 2 3
1.00000 0.83143 0.56094 0.26629 0.67041 0.29707 -0.00936 -0.22058
0.67041 0.84330 0.81972 0.66154 1.00000 0.77132 0.48658 0.22014
Schematic Representation of Cross Correlations Variable/ Lag 0 1 2
3
y1 y2
++ -+
++ ++
++ ++
++ .+
+ is > 2*std error, - is < -2*std error, . is between
Tentative Order Selection F 2129
Partial Autoregressive Matrices For each m D 1; 2; : : : ; p you can define a sequence of matrices ˆmm , which is called the partial autoregression matrices of lag m, as the solution for ˆmm to the Yule-Walker equations of order m,
.l/ D
m X
.l
i /ˆ0im ; l D 1; 2; : : : ; m
i D1
The sequence of the partial autoregression matrices ˆmm of order m has the characteristic property that if the process follows the AR(p), then ˆpp D ˆp and ˆmm D 0 for m > p. Hence, the matrices ˆmm have the cutoff property for a VAR(p) model, and so they can be useful in the identification of the order of a pure VAR model. The following statements use the PARCOEF option to compute the partial autoregression matrices: proc varmax data=simul1; model y1 y2 / p=1 noint lagmax=3 printform=univariate print=(corry parcoef pcorr pcancorr roots); run;
Figure 32.40 shows that the model can be obtained by an AR order m D 1 since partial autoregression matrices are insignificant after lag 1 with respect to two standard errors. The matrix for lag 1 is the same as the Yule-Walker autoregressive matrix. Figure 32.40 Partial Autoregression Matrices (PARCOEF Option) The VARMAX Procedure Partial Autoregression Lag 1 2 3
Variable y1 y2 y1 y2 y1 y2
y1
y2
1.14844 0.54985 -0.00724 0.02409 -0.02578 -0.03720
-0.50954 0.37409 0.05138 0.05909 0.03885 0.10149
Schematic Representation of Partial Autoregression Variable/ Lag 1 2 3 y1 y2
+++
.. ..
.. ..
+ is > 2*std error, - is < -2*std error, . is between
2130 F Chapter 32: The VARMAX Procedure
Partial Correlation Matrices Define the forward autoregression m X1
yt D
ˆi;m
1 yt i
C um;t
i D1
and the backward autoregression
yt
m
D
m X1
ˆi;m
1 yt mCi
C um;t
m
i D1
The matrices P .m/ defined by Ansley and Newbold (1979) are given by 1=2 0 1=2 1 ˆmm †m 1
P .m/ D †m where
†m
1
D Cov.um;t / D .0/
m X1
. i/ˆ0i;m
1
i D1
and †m
1
D Cov.um;t
m / D .0/
m X1
.m
0
i/ˆm
i;m 1
i D1
P .m/ are the partial cross-correlation matrices at lag m between the elements of yt and yt m , given yt 1 ; : : : ; yt mC1 . The matrices P .m/ have the cutoff property for a VAR(p) model, and so they can be useful in the identification of the order of a pure VAR structure. The following statements use the PCORR option to compute the partial cross-correlation matrices: proc varmax data=simul1; model y1 y2 / p=1 noint lagmax=3 print=(pcorr) printform=univariate; run;
The partial cross-correlation matrices in Figure 32.41 are insignificant after lag 1 with respect to two standard errors. This indicates that an AR order of m D 1 can be an appropriate choice.
Tentative Order Selection F 2131
Figure 32.41 Partial Correlations (PCORR Option) The VARMAX Procedure Partial Cross Correlations by Variable Variable
Lag
y1
y2
1 2 3 1 2 3
0.80348 0.00276 -0.01091 -0.30946 0.04676 0.01993
0.42672 0.03978 0.00032 0.71906 0.07045 0.10676
y1
y2
Schematic Representation of Partial Cross Correlations Variable/ Lag 1 2 3 y1 y2
++ -+
.. ..
.. ..
+ is > 2*std error, - is < -2*std error, . is between
Partial Canonical Correlation Matrices The partial canonical correlations at lag m between the vectors yt and yt m , given yt 1 ; : : : ; yt mC1 , are 1 1 .m/ 2 .m/ k .m/. The partial canonical correlations are the canonical correlations between the residual series um;t and um;t m , where um;t and um;t m are defined in the previous section. Thus, the squared partial canonical correlations i2 .m/ are the eigenvalues of the matrix fCov.um;t /g
1
0
E.um;t um;t
0 1 m /fCov.um;t m /g E.um;t m um;t /
0
0
D ˆmm ˆmm
It follows that the test statistic to test for ˆm D 0 in the VAR model of order m > p is approximately .T
0
0
m/ tr fˆmm ˆmm g .T
m/
k X
i2 .m/
i D1
and has an asymptotic chi-square distribution with k 2 degrees of freedom for m > p. The following statements use the PCANCORR option to compute the partial canonical correlations: proc varmax data=simul1; model y1 y2 / p=1 noint lagmax=3 print=(pcancorr); run;
2132 F Chapter 32: The VARMAX Procedure
Figure 32.42 shows that the partial canonical correlations i .m/ between yt and yt m are {0.918, 0.773}, {0.092, 0.018}, and {0.109, 0.011} for lags m D1 to 3. After lag m D1, the partial canonical correlations are insignificant with respect to the 0.05 significance level, indicating that an AR order of m D 1 can be an appropriate choice. Figure 32.42 Partial Canonical Correlations (PCANCORR Option) The VARMAX Procedure Partial Canonical Correlations Lag
Correlation1
Correlation2
DF
Chi-Square
Pr > ChiSq
1 2 3
0.91783 0.09171 0.10861
0.77335 0.01816 0.01078
4 4 4
142.61 0.86 1.16
2*std error, is < -2*std error, . is between, * is N/A
Model Parameter Estimates
Equation Parameter
Estimate
y1
1.01809 -0.38651 0.32291 -0.02153 0.39147 0.55290 -0.16566 0.58612
y2
AR1_1_1 AR1_1_2 MA1_1_1 MA1_1_2 AR1_2_1 AR1_2_2 MA1_2_1 MA1_2_2
Standard Error t Value Pr > |t| Variable 0.10257 0.09644 0.14530 0.14200 0.10062 0.08421 0.15700 0.14115
9.93 -4.01 2.22 -0.15 3.89 6.57 -1.06 4.15
0.0001 0.0001 0.0285 0.8798 0.0002 0.0001 0.2939 0.0001
y1(t-1) y2(t-1) e1(t-1) e2(t-1) y1(t-1) y2(t-1) e1(t-1) e2(t-1)
2148 F Chapter 32: The VARMAX Procedure
The fitted VARMA(1,1) model with estimated standard errors in parentheses is given as 0
1 1:01809 0:38651 B .0:10256/ .0:09644/ C Cy yt D B @ 0:39147 0:55290 A t .0:10062/ .0:08421/
0 1
C t
1 0:32291 0:02153 B .0:14530/ .0:14199/ C B C @ 0:16566 0:58613 A t .0:15699/ .0:14115/
1
VARMAX Modeling A VARMAX(p; q; s) process is written as
yt D ı C
p X
ˆi yt
i
i D1
C
s X
‚i xt i
i D0
C t
q X
‚i t
i
i D1
or ˆ.B/yt D ı C ‚ .B/xt C ‚.B/t where D Ik Pq ˆ.B/ i i D1 ‚i B .
Pp
i D1 ˆi B
i,
‚ .B/ D ‚0 C ‚1 B C C ‚s B s , and ‚.B/ D Ik
The dimension of the state-space vector of the Kalman filtering method for the parameter estimation of the VARMAX(p,q,s) model is large, which takes time and memory for computing. For convenience, the parameter estimation of the VARMAX(p,q,s) model uses the two-stage estimation method, which first estimates the deterministic terms and exogenous parameters, and then maximizes the log-likelihood function of a VARMA(p,q) model. Some examples of VARMAX modeling are as follows: model y1 y2 = x1 / q=1; nloptions tech=qn;
model y1 y2 = x1 / p=1 q=1 xlag=1 nocurrentx; nloptions tech=qn;
Model Diagnostic Checks Multivariate Model Diagnostic Checks Information Criterion After fitting some candidate models to the data, various model selection criteria (normalized by T ) can be used to choose the appropriate model. The following list includes the Akaike information criterion (AIC), the corrected Akaike information criterion
Model Diagnostic Checks F 2149
(AICC), the final prediction error criterion (FPE), the Hannan-Quinn criterion (HQC), and the Schwarz Bayesian criterion (SBC, also referred to as BIC): Q C 2r=T AIC D log.j†j/ Q C 2r=.T AICC D log.j†j/
r=k/ T C r=k k Q / j†j FPE D . T r=k Q C 2r log.log.T //=T HQC D log.j†j/ Q C r log.T /=T SBC D log.j†j/
where r denotes the number of parameters estimated, k is the number of dependent variables, Q is the maximum likelihood T is the number of observations used to estimate the model, and † estimate of †. When comparing models, choose the model with the smallest criterion values. An example of the output was displayed in Figure 32.4. Portmanteau Qs statistic The Portmanteau Qs statistic is used to test whether correlation remains on the model residuals. The null hypothesis is that the residuals are uncorrelated. Let C .l/ be the residual cross-covariance matrices, O .l/ be the residual cross-correlation matrices as C .l/ D T
T Xl
1
t t0 Cl
t D1
and O .l/ D VO
1=2
C .l/VO
1=2
and O . l/ D O .l/0
2 2 O The multivariate where VO D Diag.O 11 ; : : : ; O kk / and O i2i are the diagonal elements of †. portmanteau test defined in Hosking (1980) is
Qs D T
2
s X
.T
l/
1
trfO .l/†
1
O . l/†
1
g
lD1
The statistic Qs has approximately the chi-square distribution with k 2 .s freedom. An example of the output is displayed in Figure 32.7.
p
q/ degrees of
Univariate Model Diagnostic Checks There are various ways to perform diagnostic checks for a univariate model. For details, see the section “Testing for Nonlinear Dependence: Heteroscedasticity Tests” on page 402 in Chapter 8, “The AUTOREG Procedure.” An example of the output is displayed in Figure 32.8 and Figure 32.9. Durbin-Watson (DW) statistics: The DW test statistics test for the first order autocorrelation in the residuals.
2150 F Chapter 32: The VARMAX Procedure
Jarque-Bera normality test: This test is helpful in determining whether the model residuals represent a white noise process. This tests the null hypothesis that the residuals have normality. F tests for autoregressive conditional heteroscedastic (ARCH) disturbances: F test statistics test for the heteroscedastic disturbances in the residuals. This tests the null hypothesis that the residuals have equal covariances F tests for AR disturbance: These test statistics are computed from the residuals of the univariate AR(1), AR(1,2), AR(1,2,3) and AR(1,2,3,4) models to test the null hypothesis that the residuals are uncorrelated.
Cointegration This section briefly introduces the concepts of cointegration (Johansen 1995b). Definition 1. (Engle and Granger 1987): If a series yt with no deterministic components can be represented by a stationary and invertible ARMA process after differencing d times, the series is integrated of order d , that is, yt I.d /. Definition 2. (Engle and Granger 1987): If all elements of the vector yt are I.d / and there exists a cointegrating vector ˇ ¤ 0 such that ˇ 0 yt I.d b/ for any b > 0, the vector process is said to be cointegrated CI.d; b/. A simple example of a cointegrated process is the following bivariate system: y1t
D y2t C 1t
y2t
D y2;t
1
C 2t
with 1t and 2t being uncorrelated white noise processes. In the second equation, y2t is a random walk, y2t D 2t , 1 B. Differencing the first equation results in y1t D y2t C 1t D 2t C 1t
1;t
1
Thus, both y1t and y2t are I.1/ processes, but the linear combination y1t y2t is stationary. Hence yt D .y1t ; y2t /0 is cointegrated with a cointegrating vector ˇ D .1; /0 . In general, if the vector process yt has k components, then there can be more than one cointegrating vector ˇ 0 . It is assumed that there are r linearly independent cointegrating vectors with r < k, which make the k r matrix ˇ. The rank of matrix ˇ is r, which is called the cointegration rank of yt .
Common Trends This section briefly discusses the implication of cointegration for the moving-average representation. Let yt be cointegrated CI.1; 1/, then yt has the Wold representation: yt D ı C ‰.B/t
Cointegration F 2151
where t is i id.0; †/, ‰.B/ D
P1
j D0 ‰j B
j
with ‰0 D Ik , and
P1
j D0 j j‰j j
< 1.
Assume that t D 0 if t 0 and y0 is a nonrandom initial value. Then the difference equation implies that
yt D y0 C ıt C ‰.1/
t X
i C ‰ .B/t
i D0
where ‰ .B/ D .1
B/
1 .‰.B/
‰.1// and ‰ .B/ is absolutely summable.
Assume that the rank of ‰.1/ is m D k r. When the process yt is cointegrated, there is a cointegrating k r matrix ˇ such that ˇ 0 yt is stationary. Premultiplying yt by ˇ 0 results in ˇ 0 yt D ˇ 0 y0 C ˇ 0 ‰ .B/t because ˇ 0 ‰.1/ D 0 and ˇ 0 ı D 0. Stock and Watson (1988) showed that the cointegrated process yt has a common trends representation derived from the moving-average representation. Since the rank of ‰.1/ is m D k r, there is a k r matrix H1 with rank r such that ‰.1/H1 D 0. Let H2 be a k m matrix with rank m such that H20 H1 D 0; then A D C.1/H2 has rank m. The H D .H1 ; H2 / has rank k. By construction of H, ‰.1/H D Œ0; A D ASm where Sm D .0mr ; Im /. Since ˇ 0 ‰.1/ D 0 and ˇ 0 ı D 0, ı lies in the column space of ‰.1/ and can be written ı D C.1/ıQ where ıQ is a k-dimensional vector. The common trends representation is written as
yt
Q C D y0 C ‰.1/Œıt
t X
i C ‰ .B/t
i D0
D y0 C ‰.1/H ŒH
1Q
ıt C H
1
t X
i C at
i D0
D y0 C At C at and t D C t
1
C vt
where at D ‰ .B/t , D Sm H
1 ı, Q
t D Sm ŒH
1 ıt Q
CH
1
Pt
i D0 i ,
and vt D Sm H
1
t.
Stock and Watson showed that the common trends representation expresses yt as a linear combination of m random walks (t ) with drift plus I.0/ components (at /.
2152 F Chapter 32: The VARMAX Procedure
Test for the Common Trends Stock and Watson (1988) proposed statistics for common trends testing. The null hypothesis is that the k-dimensional time series yt has m common stochastic trends, where m k and the alternative is that it has s common trends, where s < m . The test procedure of m versus s common stochastic trends is performed based on the first-order serial correlation matrix of yt . Let ˇ? be a k m matrix 0 0 orthogonal to the cointegrating matrix such that ˇ? ˇ D 0 and ˇ? ˇ? D Im . Let zt D ˇ 0 yt and 0 wt D ˇ? yt . Then wt D
0 ˇ? y0
C
0 ˇ? ıt
C
0 ˇ? ‰.1/
t X
0 i C ˇ? ‰ .B/t
i D0
Combining the expression of zt and wt ,
zt wt
D C
ˇ 0 y0 0 ˇ? y0
C
ˇ 0 ‰ .B/ 0 ˇ? ‰ .B/
0 0
ˇ? ı
tC
0 0 ˇ? ‰.1/
X t
i
iD1
t
The Stock-Watson common trends test is performed based on the component wt by testing whether 0 ˇ? ‰.1/ has rank m against rank s. The following statements perform the Stock-Watson test for common trends: proc iml; sig = 100*i(2); phi = {-0.2 0.1, 0.5 0.2, 0.8 0.7, -0.4 0.6}; call varmasim(y,phi) sigma=sig n=100 initial=0 seed=45876; cn = {'y1' 'y2'}; create simul2 from y[colname=cn]; append from y; quit; data simul2; set simul2; date = intnx( 'year', '01jan1900'd, _n_-1 ); format date year4. ; run; proc varmax data=simul2; model y1 y2 / p=2 cointtest=(sw); run;
In Figure 32.51, the first column is the null hypothesis that yt has m k common trends; the second column is the alternative hypothesis that yt has s < m common trends; the third column contains the eigenvalues used for the test statistics; the fourth column contains the test statistics using AR(p) filtering of the data. The table shows the output of the case p D 2.
Vector Error Correction Modeling F 2153
Figure 32.51 Common Trends Test (COINTTEST=(SW) Option) The VARMAX Procedure Common Trend Test
H0: Rank=m
H1: Rank=s
Eigenvalue
Filter
1 2
0 0 1
1.000906 0.996763 0.648908
0.09 -0.32 -35.11
5% Critical Value -14.10 -8.80 -23.00
Lag 2
The test statistic for testing for 2 versus 1 common trends is more negative (–35.1) than the critical value (–23.0). Therefore, the test rejects the null hypothesis, which means that the series has a single common trend.
Vector Error Correction Modeling This section discusses the implication of cointegration for the autoregressive representation. Assume that the cointegrated series can be represented by a vector error correction model according to the Granger representation theorem (Engle and Granger 1987). Consider the vector autoregressive process with Gaussian errors defined by
yt D
p X
ˆi yt
i
C t
i D1
or ˆ.B/yt D t where the initial values, y pC1 ; : : : ; y0 , are fixed and t N.0; †/. Since the AR operator ˆ.B/ Pp 1 i can be re-expressed as ˆ.B/ D ˆ .B/.1 B/ C ˆ.1/B, where ˆ .B/ D Ik i D1 ˆi B with P p ˆi D ˆ , the vector error correction model is j Di C1 j ˆ .B/.1
B/yt D ˛ˇ 0 yt
1
C t
or yt D ˛ˇ 0 yt
1C
p X1
ˆi yt
i
C t
i D1
where ˛ˇ 0 D
ˆ.1/ D
Ik C ˆ1 C ˆ2 C C ˆp .
2154 F Chapter 32: The VARMAX Procedure
One motivation for the VECM(p) form is to consider the relation ˇ 0 yt D c as defining the underlying economic relations and assume that the agents react to the disequilibrium error ˇ 0 yt c through the adjustment coefficient ˛ to restore equilibrium; that is, they satisfy the economic relations. The cointegrating vector, ˇ is sometimes called the long-run parameters. You can consider a vector error correction model with a deterministic term. The deterministic term Dt can contain a constant, a linear trend, and seasonal dummy variables. Exogenous variables can also be included in the model.
yt D …yt
1
C
p X1
ˆi yt
i
C ADt C
i D1
s X
‚i xt
i
C t
i D0
where … D ˛ˇ 0 . The alternative vector error correction representation considers the error correction term at lag t and is written as yt D
p X1
]
ˆi yt
i
C …] yt
p C ADt C
i D1
s X
‚i xt
i
p
C t
i D0
If the matrix … has a full-rank (r D k), all components of yt are I.0/. On the other hand, yt are stationary in difference if rank.…/ D 0. When the rank of the matrix … is r < k, there are k r linear combinations that are nonstationary and r stationary cointegrating relations. Note that the linearly independent vector zt D ˇ 0 yt is stationary and this transformation is not unique unless r D 1. There does not exist a unique cointegrating matrix ˇ since the coefficient matrix … can also be decomposed as … D ˛MM
1 0
ˇ D ˛ ˇ
0
where M is an r r nonsingular matrix.
Test for the Cointegration The cointegration rank test determines the linearly independent columns of …. Johansen (1988, 1995a) and Johansen and Juselius (1990) proposed the cointegration rank test by using the reduced rank regression. Different Specifications of Deterministic Trends When you construct the VECM(p) form from the VAR(p) model, the deterministic terms in the VECM(p) form can differ from those in the VAR(p) model. When there are deterministic cointegrated relationships among variables, deterministic terms in the VAR(p) model are not present in the VECM(p) form. On the other hand, if there are stochastic cointegrated relationships in the VAR(p) model, deterministic terms appear in the VECM(p) form via the error correction term or as an independent term in the VECM(p) form. There are five different specifications of deterministic trends in the VECM(p) form.
Vector Error Correction Modeling F 2155
Case 1: There is no separate drift in the VECM(p) form. 0
yt D ˛ˇ yt
1
C
p X1
ˆi yt
i
C t
i D1
Case 2: There is no separate drift in the VECM(p) form, but a constant enters only via the error correction term. yt D ˛.ˇ
0
; ˇ0 /.y0t 1 ; 1/0
C
p X1
ˆi yt
i
C t
i D1
Case 3: There is a separate drift and no separate linear trend in the VECM(p) form. yt D ˛ˇ 0 yt
1C
p X1
ˆi yt
i
C ı0 C t
i D1
Case 4: There is a separate drift and no separate linear trend in the VECM(p) form, but a linear trend enters only via the error correction term. yt D ˛.ˇ
0
; ˇ1 /.y0t 1 ; t /0
C
p X1
ˆi yt
i
C ı0 C t
i D1
Case 5: There is a separate linear trend in the VECM(p) form. yt D ˛ˇ 0 yt
1C
p X1
ˆi yt
i
C ı0 C ı1 t C t
i D1
First, focus on Cases 1, 3, and 5 to test the null hypothesis that there are at most r cointegrating vectors. Let Z0t
D yt
Z1t
D yt
Z2t
D
Z0 D Z1 D Z2 D
1 Œy0t 1 ; : : : ; y0t pC1 ; Dt 0 ŒZ01 ; : : : ; Z0T 0 ŒZ11 ; : : : ; Z1T 0 ŒZ21 ; : : : ; Z2T 0
where Dt can be empty for Case 1, 1 for Case 3, and .1; t/ for Case 5. In Case 2, Z1t and Z2t are defined as Z1t
D Œy0t
0 1 ; 1 Œy0t 1 ; : : : ; y0t pC1 0
Z2t
D
2156 F Chapter 32: The VARMAX Procedure
In Case 4, Z1t and Z2t are defined as Z1t
D Œy0t
Z2t
D
0 1; t Œy0t 1 ; : : : ; y0t pC1 ; 10
Let ‰ be the matrix of parameters consisting of ˆ1 , . . . , ˆp 1 , A, and ‚0 , . . . , ‚s , where parameters A corresponds to regressors Dt . Then the VECM(p) form is rewritten in these variables as Z0t D ˛ˇ 0 Z1t C ‰Z2t C t The log-likelihood function is given by ` D
kT log 2 2 T 1X .Z0t 2
T log j†j 2 ˛ˇ 0 Z1t
‰Z2t /0 †
1
.Z0t
˛ˇ 0 Z1t
‰Z2t /
t D1
The residuals, R0t and R1t , are obtained by regressing Z0t and Z1t on Z2t , respectively. The regression equation of residuals is R0t D ˛ˇ 0 R1t C O t The crossproducts matrices are computed Sij D
T 1 X 0 Rit Rjt ; i; j D 0; 1 T tD1
Then the maximum likelihood estimator for ˇ is obtained from the eigenvectors that correspond to the r largest eigenvalues of the following equation: jS11
S10 S001 S01 j D 0
The eigenvalues of the preceding equation are squared canonical correlations between R0t and R1t , and the eigenvectors that correspond to the r largest eigenvalues are the r linear combinations of yt 1 , which have the largest squared partial correlations with the stationary process yt after correcting for lags and deterministic terms. Such an analysis calls for a reduced rank regression of yt on yt 1 corrected for .yt 1 ; : : : ; yt pC1 ; Dt /, as discussed by Anderson (1951). Johansen (1988) suggests two test statistics to test the null hypothesis that there are at most r cointegrating vectors H0 W i D 0 for i D r C 1; : : : ; k
Vector Error Correction Modeling F 2157
Trace Test The trace statistic for testing the null hypothesis that there are at most r cointegrating vectors is as follows: t race D
T
k X
log.1
i /
i DrC1
The asymptotic distribution of this statistic is given by (Z ) Z 1 1Z 1 1 0 0 0 tr .d W /WQ WQ WQ dr WQ .d W / 0
0
0
where t r.A/ is the trace of a matrix A, W is the k r dimensional Brownian motion, and WQ is the Brownian motion itself, or the demeaned or detrended Brownian motion according to the different specifications of deterministic trends in the vector error correction model. Maximum Eigenvalue Test The maximum eigenvalue statistic for testing the null hypothesis that there are at most r cointegrating vectors is as follows: max D
T log.1
rC1 /
The asymptotic distribution of this statistic is given by Z 1 Z maxf .d W /WQ 0 . 0
1 0
WQ WQ 0 dr/
1
1
Z 0
WQ .d W /0 g
where max.A/ is the maximum eigenvalue of a matrix A. Osterwald-Lenum (1992) provided detailed tables of the critical values of these statistics. The following statements use the JOHANSEN option to compute the Johansen cointegration rank trace test of integrated order 1: proc varmax data=simul2; model y1 y2 / p=2 cointtest=(johansen=(normalize=y1)); run;
Figure 32.52 shows the output based on the model specified in the MODEL statement, an intercept term is assumed. In the “Cointegration Rank Test Using Trace” table, the column Drift In ECM means there is no separate drift in the error correction model and the column Drift In Process means the process has a constant drift before differencing. The “Cointegration Rank Test Using Trace” table shows the trace statistics based on Case 3 and the “Cointegration Rank Test Using Trace under Restriction” table shows the trace statistics based on Case 2. The output indicates that the series are cointegrated with rank 1 because the trace statistics are smaller than the critical values in both Case 2 and Case 3.
2158 F Chapter 32: The VARMAX Procedure
Figure 32.52 Cointegration Rank Test (COINTTEST=(JOHANSEN=) Option) The VARMAX Procedure Cointegration Rank Test Using Trace
H0: Rank=r
H1: Rank>r
Eigenvalue
Trace
0 1
0 1
0.4644 0.0056
61.7522 0.5552
5% Critical Value 15.34 3.84
Drift in ECM
Drift in Process
Constant
Linear
Cointegration Rank Test Using Trace Under Restriction
H0: Rank=r
H1: Rank>r
Eigenvalue
Trace
0 1
0 1
0.5209 0.0426
76.3788 4.2680
5% Critical Value 19.99 9.13
Drift in ECM
Drift in Process
Constant
Constant
Figure 32.53 shows which result, either Case 2 (the hypothesis H0) or Case 3 (the hypothesis H1), is appropriate depending on the significance level. Since the cointegration rank is chosen to be 1 by the result in Figure 32.52, look at the last row that corresponds to rank=1. Since the p-value is 0.054, the Case 2 cannot be rejected at the significance level 5%, but it can be rejected at the significance level 10%. For modeling of the two Case 2 and Case 3, see Figure 32.56 and Figure 32.57. Figure 32.53 Cointegration Rank Test Continued Hypothesis of the Restriction
Hypothesis
Drift in ECM
Drift in Process
H0(Case 2) H1(Case 3)
Constant Constant
Constant Linear
Hypothesis Test of the Restriction
Rank
Eigenvalue
Restricted Eigenvalue
DF
Chi-Square
Pr > ChiSq
0 1
0.4644 0.0056
0.5209 0.0426
2 1
14.63 3.71
0.0007 0.0540
Vector Error Correction Modeling F 2159
Figure 32.54 shows the estimates of long-run parameter (Beta) and adjustment coefficients (Alpha) based on Case 3. Figure 32.54 Cointegration Rank Test Continued Beta Variable y1 y2
1
2
1.00000 -2.04869
1.00000 -0.02854
Alpha Variable y1 y2
1
2
-0.46421 0.17535
-0.00502 -0.01275
Using the NORMALIZE= option, the first low of the “Beta” table has 1. Considering that the cointegration rank is 1, the long-run relationship of the series is 0
ˇ yt
D
1
D y1t y1t
2:04869
y1 y2
2:04869y2t
D 2:04869y2t
Figure 32.55 shows the estimates of long-run parameter (Beta) and adjustment coefficients (Alpha) based on Case 2. Figure 32.55 Cointegration Rank Test Continued Beta Under Restriction Variable y1 y2 1
1
2
1.00000 -2.04366 6.75919
1.00000 -2.75773 101.37051
Alpha Under Restriction Variable y1 y2
1
2
-0.48015 0.12538
0.01091 0.03722
2160 F Chapter 32: The VARMAX Procedure
Considering that the cointegration rank is 1, the long-run relationship of the series is 3 y1 2:04366 6:75919 4 y2 5 1 2:04366 y2t C 6:75919 2
ˇ 0 yt
D
1
D y1t y1t
D 2:04366 y2t
6:75919
Estimation of Vector Error Correction Model The preceding log-likelihood function is maximized for 1=2 ˇO D S11 Œv1 ; : : : ; vr O ˇO 0 S11 ˇ/ O 1 ˛O D S01 ˇ.
O D ˛O ˇO 0 … O 0 D .Z20 Z2 / 1 Z20 .Z0 Z1 … O 0/ ‰ O D .Z0 Z2 ‰ O 0 Z1 … O 0 /0 .Z0 Z2 ‰ O0 †
O 0 /=T Z1 …
The estimators of the orthogonal complements of ˛ and ˇ are ˇO? D S11 ŒvrC1 ; : : : ; vk and ˛O ? D S001 S01 ŒvrC1 ; : : : ; vk The ML estimators have the following asymptotic properties: p
d
O ‰ O T vec.Œ…;
Œ…; ‰/ ! N.0; †co /
where †co D † ˝
ˇ 0 0 Ik
1
ˇ0 0 0 Ik
and 1 D plim T
ˇ 0 Z10 Z1 ˇ ˇ 0 Z10 Z2 Z20 Z1 ˇ Z20 Z2
The following statements are examples of fitting the five different cases of the vector error correction models mentioned in the previous section.
Vector Error Correction Modeling F 2161
For fitting Case 1, model y1 y2 / p=2 ecm=(rank=1 normalize=y1) noint;
For fitting Case 2, model y1 y2 / p=2 ecm=(rank=1 normalize=y1 ectrend);
For fitting Case 3, model y1 y2 / p=2 ecm=(rank=1 normalize=y1);
For fitting Case 4, model y1 y2 / p=2 ecm=(rank=1 normalize=y1 ectrend) trend=linear;
For fitting Case 5, model y1 y2 / p=2 ecm=(rank=1 normalize=y1) trend=linear;
From Figure 32.53 that uses the COINTTEST=(JOHANSEN) option, you can fit the model by using either Case 2 or Case 3 because the test was not significant at the 0.05 level, but was significant at the 0.10 level. Here both models are fitted to show the difference in output display. Figure 32.56 is for Case 2, and Figure 32.57 is for Case 3. For Case 2, proc varmax data=simul2; model y1 y2 / p=2 ecm=(rank=1 normalize=y1 ectrend) print=(estimates); run;
2162 F Chapter 32: The VARMAX Procedure
Figure 32.56 Parameter Estimation with the ECTREND Option The VARMAX Procedure Parameter Alpha * Beta' Estimates Variable y1 y2
y1
y2
1
-0.48015 0.12538
0.98126 -0.25624
-3.24543 0.84748
AR Coefficients of Differenced Lag DIF Lag 1
Variable y1 y2
y1
y2
-0.72759 0.38982
-0.77463 -0.55173
Model Parameter Estimates
Equation Parameter
Estimate
D_y1
-3.24543 -0.48015 0.98126 -0.72759 -0.77463 0.84748 0.12538 -0.25624 0.38982 -0.55173
CONST1 AR1_1_1 AR1_1_2 AR2_1_1 AR2_1_2 CONST2 AR1_2_1 AR1_2_2 AR2_2_1 AR2_2_2
D_y2
Standard Error t Value Pr > |t| Variable 0.33022 0.04886 0.09984 0.04623 0.04978 0.35394 0.05236 0.10702 0.04955 0.05336
-15.74 -15.56
0.0001 0.0001
7.87 -10.34
0.0001 0.0001
1, EC y1(t-1) y2(t-1) D_y1(t-1) D_y2(t-1) 1, EC y1(t-1) y2(t-1) D_y1(t-1) D_y2(t-1)
Figure 32.56 can be reported as follows: yt
0:48015 0:12538
D C
0:72759 0:38982
0:98126 0:25624
2
y1;t 4 y2;t 1
3:24543 0:84748 0:77463 yt 1 C t 0:55173
1 1
3 5
The keyword “EC” in the “Model Parameter Estimates” table means that the ECTREND option is used for fitting the model. For fitting Case 3, proc varmax data=simul2; model y1 y2 / p=2 ecm=(rank=1 normalize=y1) print=(estimates); run;
Vector Error Correction Modeling F 2163
Figure 32.57 Parameter Estimation without the ECTREND Option The VARMAX Procedure Parameter Alpha * Beta' Estimates Variable y1 y2
y1
y2
-0.46421 0.17535
0.95103 -0.35923
AR Coefficients of Differenced Lag DIF Lag 1
Variable y1 y2
y1
y2
-0.74052 0.34820
-0.76305 -0.51194
Model Parameter Estimates
Equation Parameter
Estimate
D_y1
-2.60825 -0.46421 0.95103 -0.74052 -0.76305 3.43005 0.17535 -0.35923 0.34820 -0.51194
CONST1 AR1_1_1 AR1_1_2 AR2_1_1 AR2_1_2 CONST2 AR1_2_1 AR1_2_2 AR2_2_1 AR2_2_2
D_y2
Standard Error t Value Pr > |t| Variable 1.32398 0.05474 0.11215 0.05060 0.05352 1.39587 0.05771 0.11824 0.05335 0.05643
-1.97
-14.63 -14.26 2.46
6.53 -9.07
0.0518 1 y1(t-1) y2(t-1) 0.0001 D_y1(t-1) 0.0001 D_y2(t-1) 0.0159 1 y1(t-1) y2(t-1) 0.0001 D_y1(t-1) 0.0001 D_y2(t-1)
Figure 32.57 can be reported as follows: yt
0:46421 0:17535
D C
0:95103 0:35293 2:60825 C t 3:43005
yt
1
C
0:74052 0:34820
0:76305 0:51194
yt
1
Test for the Linear Restriction on the Parameters Consider the example with the variables mt log real money, yt log real income, itd deposit interest rate, and itb bond interest rate. It seems a natural hypothesis that in the long-run relation, money and income have equal coefficients with opposite signs. This can be formulated as the hypothesis that the cointegrated relation contains only mt and yt through mt yt . For the analysis, you can express these restrictions in the parameterization of H such that ˇ D H, where H is a known k s matrix
2164 F Chapter 32: The VARMAX Procedure
and
is the s r.r s < k/ parameter matrix to be estimated. For this example, H is given by 2 3 1 0 0 6 1 0 0 7 7 H D6 4 0 1 0 5 0 0 1
Restriction H0 W ˇ D H When the linear restriction ˇ D H is given, it implies that the same restrictions are imposed on all cointegrating vectors. You obtain the maximum likelihood estimator of ˇ by reduced rank regression of yt on H yt 1 corrected for .yt 1 ; : : : ; yt pC1 ; Dt /, solving the following equation jH 0 S11 H
H 0 S10 S001 S01 H j D 0
for the eigenvalues 1 > 1 > > s > 0 and eigenvectors .v1 ; : : : ; vs /, Sij given in the preceding section. Then choose O D .v1 ; : : : ; vr / that corresponds to the r largest eigenvalues, and the ˇO is O H . The test statistic for H0 W ˇ D H is given by T
r X
logf.1
i /=.1
d
i /g ! 2r.k
s/
i D1
If the series has no deterministic trend, the constant term should be restricted by ˛0? ı0 D 0 as in Case 2. Then H is given by 2 3 1 0 0 0 6 1 0 0 0 7 6 7 7 0 1 0 0 H D6 6 7 4 0 0 1 0 5 0 0 0 1 The following statements test that 2 ˇ1 C ˇ2 D 0: proc varmax data=simul2; model y1 y2 / p=2 ecm=(rank=1 normalize=y1); cointeg rank=1 h=(1,-2); run;
Figure 32.58 shows the results of testing H0 W 2ˇ1 C ˇ2 D 0. The input H matrix is H D .1 2/0 . The adjustment coefficient is reestimated under the restriction, and the test indicates that you cannot reject the null hypothesis.
Vector Error Correction Modeling F 2165
Figure 32.58 Testing of Linear Restriction (H= Option) The VARMAX Procedure Beta Under Restriction Variable
1
y1 y2
1.00000 -2.00000
Alpha Under Restriction Variable
1
y1 y2
-0.47404 0.17534
Hypothesis Test
Index
Eigenvalue
Restricted Eigenvalue
DF
Chi-Square
Pr > ChiSq
1
0.4644
0.4616
1
0.51
0.4738
Test for the Weak Exogeneity and Restrictions of Alpha Consider a vector error correction model: 0
yt D ˛ˇ yt
1
C
p X1
ˆi yt
i
C ADt C t
i D1
Divide the process yt into .y01t ; y02t /0 with dimension k1 and k2 and the † into †D
†11 †12 †21 †22
Similarly, the parameters can be decomposed as follows: ˛D
˛1 ˛2
ˆi
D
ˆ1i ˆ2i
AD
A1 A2
Then the VECM(p) form can be rewritten by using the decomposed parameters and processes:
y1t y2t
D
˛1 ˛2
0
ˇ yt
1
C
p X1 iD1
ˆ1i ˆ2i
yt
i
C
A1 A2
Dt C
1t 2t
2166 F Chapter 32: The VARMAX Procedure
The conditional model for y1t given y2t is
!˛2 /ˇ 0 yt
D !y2t C .˛1
y1t
1C
p X1
.ˆ1i
!ˆ2i /yt
i
i D1
C.A1
!A2 /Dt C 1t
!2t
and the marginal model of y2t is y2t D ˛2 ˇ 0 yt
1C
p X1
ˆ2i yt
i
C A2 Dt C 2t
i D1
where ! D †12 †221 . The test of weak exogeneity of y2t for the parameters .˛1 ; ˇ/ determines whether ˛2 D 0. Weak exogeneity means that there is no information about ˇ in the marginal model or that the variables y2t do not react to a disequilibrium. Restriction H0 W ˛ D J Consider the null hypothesis H0 W ˛ D J , where J is a k m matrix with r m < k. From the previous residual regression equation R0t D ˛ˇ 0 R1t C O t D J ˇ 0 R1t C O t you can obtain JN 0 R0t
D
ˇ 0 R1t C JN 0 O t
J?0 R0t
D J?0 O t
where JN D J.J 0 J /
1
and J? is orthogonal to J such that J?0 J D 0.
Define †JJ? D JN 0 †J? and †J? J? D J?0 †J? and let ! D †JJ? †J?1 J? . Then JN 0 R0t can be written as JN 0 R0t D
ˇ 0 R1t C !J?0 R0t C JN 0 O t
!J?0 O t
Using the marginal distribution of J?0 R0t and the conditional distribution of JN 0 R0t , the new residuals are computed as RQ J t RQ 1t
D JN 0 R0t D R1t
SJJ? SJ?1J? J?0 R0t S1J? SJ?1J? J?0 R0t
Vector Error Correction Modeling F 2167
where SJJ? D JN 0 S00 J? ; SJ? J? D J?0 S00 J? ; and SJ? 1 D J?0 S01 In terms of RQ J t and RQ 1t , the MLE of ˇ is computed by using the reduced rank regression. Let Sij:J? D
T 1 X Q Q0 Rit Rjt ; for i; j D 1; J T t D1
Under the null hypothesis H0 W ˛ D J , the MLE ˇQ is computed by solving the equation jS11:J?
1 S jD0 S1J:J? SJJ:J ? J1:J?
Then ˇQ D .v1 ; : : : ; vr /, where the eigenvectors correspond to the r largest eigenvalues. The likelihood ratio test for H0 W ˛ D J is T
r X
logf.1
i /=.1
d
i /g ! 2r.k
m/
i D1
The test of weak exogeneity of y2t is a special case of the test ˛ D J , considering J D .Ik1 ; 0/0 . Consider the previous example with four variables ( mt ; yt ; itb ; itd ). If r D 1, you formulate the weak exogeneity of (yt ; itb ; itd ) for mt as J D Œ1; 0; 0; 00 and the weak exogeneity of itd for (mt ; yt ; itb ) as J D ŒI3 ; 00 . The following statements test the weak exogeneity of other variables, assuming r D 1: proc varmax data=simul2; model y1 y2 / p=2 ecm=(rank=1 normalize=y1); cointeg rank=1 exogeneity; run; proc varmax data=simul2; model y1 y2 / p=2 ecm=(rank=1 normalize=y1); cointeg rank=1 j=exogeneity; run;
Figure 32.59 shows that each variable is not the weak exogeneity of other variable. Figure 32.59 Testing of Weak Exogeneity (EXOGENEITY Option) The VARMAX Procedure Testing Weak Exogeneity of Each Variables Variable y1 y2
DF
Chi-Square
Pr > ChiSq
1 1
53.46 8.76
0,
yt Cl D ı.t C l/ C
t t Cl X Xi
‰ j i C
i D1 j D0
l i l X X
‰j t Ci
i D1 j D0
The l-step-ahead forecast is derived from the preceding equation:
yt Cljt D .t C l/ C
t t Cl X Xi
‰j i
i D1 j D0
Note that lim ˇ 0 yt Cljt D 0
l!1
P i 0 since liml!1 tjCl D0 ‰j D ‰.1/ and ˇ ‰.1/ D 0. The long-run forecast of the cointegrated system shows that the cointegrated relationship holds, although there might exist some deviations from the equilibrium status in the short-run. The covariance matrix of the predict error et Cljt D yt Cl yt Cljt is †.l/ D
l l i l i X X X Œ. ‰j /†. ‰j0 / i D1 j D0
j D0
When the linear process is represented as a VECM(p) model, you can obtain
yt D …yt
1C
p X1
ˆj yt
j D1
The transition equation is defined as zt D F zt
1
C et
j
C ı C t
I(2) Model F 2169
where zt D .y0t 2 6 6 6 F D6 6 4
0 0 1 ; yt ; yt 1 ;
; y0t
Ik Ik 0 … .… C ˆ1 / ˆ2 0 Ik 0 :: :: :: : : : 0 0
0 pC2 /
0 ˆp 0 :: :: : : Ik 0
is a state vector and the transition matrix is 3
1
7 7 7 7 7 5
where 0 is a k k zero matrix. The observation equation can be written yt D ıt C H zt where H D ŒIk ; Ik ; 0; : : : ; 0. The l-step-ahead forecast is computed as yt Cljt D ı.t C l/ C HF l zt
Cointegration with Exogenous Variables The error correction model with exogenous variables can be written as follows: 0
yt D ˛ˇ yt
1
C
p X1
ˆi yt i
C ADt C
i D1
s X
‚i xt
i
C t
i D0
The following statements demonstrate how to fit VECMX(p; s), where p D 2 and s D 1 from the P=2 and XLAG=1 options: proc varmax data=simul3; model y1 y2 = x1 / p=2 xlag=1 ecm=(rank=1); run;
The following statements demonstrate how to BVECMX(2,1): proc varmax data=simul3; model y1 y2 = x1 / p=2 xlag=1 ecm=(rank=1) prior=(lambda=0.9 theta=0.1); run;
I(2) Model The VARX(p,s) model can be written in the error correction form: yt D ˛ˇ 0 yt
1C
p X1 i D1
ˆi yt
i
C ADt C
s X i D0
‚i xt
i
C t
2170 F Chapter 32: The VARMAX Procedure
Pp
1 i D1 ˆi .
Let ˆ D Ik
If ˛ and ˇ have full-rank r, and rank.˛0? ˆ ˇ? / D k
r, then yt is an I.1/ process.
If the condition rank.˛0? ˆ ˇ? / D k r fails and ˛0? ˆ ˇ? has reduced-rank ˛0? ˆ ˇ? D 0 where and are .k r/ s matrices with s k r, then ˛? and ˇ? are defined as k .k r/ matrices of full rank such that ˛0 ˛? D 0 and ˇ 0 ˇ? D 0. If and have full-rank s, then the process yt is I.2/, which has the implication of I.2/ model for the moving-average representation.
yt D B0 C B1 t C C2
j t X X
i C C1
j D1 i D1
t X
i C C0 .B/t
i D1
The matrices C1 , C2 , and C0 .B/ are determined by the cointegration properties of the process, and B0 and B1 are determined by the initial values. For details, see Johansen (1995a). The implication of the I.2/ model for the autoregressive representation is given by
2
yt D …yt
ˆ yt
1
1
C
p X2
2
‰i yt
i
C ADt C
i D1
Pp
1 j Di C1 ˆi
where ‰i D
and ˆ D Ik
s X
‚i xt
i
C t
i D0
Pp
1 i D1 ˆi .
Test for I(2) The I.2/ cointegrated model is given by the following parameter restrictions: Hr;s W … D ˛ˇ 0 and ˛0? ˆ ˇ? D 0 where and are .k r/ s matrices with 0 s k r. Let Hr0 represent the I.1/ model where 0 represent the I.2/ model where and have full-rank s, and let ˛ and ˇ have full-rank r, let Hr;s Hr;s represent the I.2/ model where and have rank s. The following table shows the relation between the I.1/ models and the I.2/ models. Relation between the I.1/ and I.2/ Models
Table 32.2
I.2/ rnk
r
k H00
0 1 :: : k
s
1
H01 H10
I.1/ 1
k-1
H0;k 1 H1;k 2 :: : Hk 1;0
:: :
H0k H1;k 1 :: : Hk 1;1
D D :: : D
H00 H10 :: : Hk0 1
I(2) Model F 2171
Johansen (1995a) proposed the two-step procedure to analyze the I.2/ model. In the first step, the values of .r; ˛; ˇ/ are estimated using the reduced rank regression analysis, performing the regression analysis 2 yt , yt 1 , and yt 1 on 2 yt 1 ; : : : ; 2 yt pC2 ; and Dt . This gives residuals R0t , R1t , and R2t , and residual product moment matrices Mij D
T 1 X 0 Rit Rjt for i; j D 0; 1; 2 T t D1
Perform the reduced rank regression analysis 2 yt on yt 1 corrected for yt 2 yt 1 ; : : : ; 2 yt pC2 ; and Dt , and solve the eigenvalue problem of the equation
1,
1 M20:1 M00:1 M02:1 j D 0
jM22:1
Mi1 M111 M1j for i; j D 0; 2.
where Mij:1 D Mij
In the second step, if .r; ˛; ˇ/ are known, the values of .s; ; / are determined using the reduced rank 0 regression analysis, regressing ˛O 0? 2 yt on ˇO? yt 1 corrected for 2 yt 1 ; : : : ; 2 yt pC2 ; Dt , and 0 ˇO yt 1 . The reduced rank regression analysis reduces to the solution of an eigenvalue problem for the equation Mˇ? ˛? :ˇ M˛?1˛? :ˇ M˛? ˇ? :ˇ j D 0
jMˇ? ˇ? :ˇ where
0 Mˇ? ˇ? :ˇ D ˇ? .M11
Mˇ0 ? ˛? :ˇ
D
M˛? ˛? :ˇ D where ˛N D ˛.˛0 ˛/
M11 ˇ.ˇ 0 M11 ˇ/
1 0
ˇ M11 /ˇ? 0 M˛? ˇ? :ˇ D ˛N ? .M01 M01 ˇ.ˇ 0 M11 ˇ/ 1 ˇ 0 M11 /ˇ? ˛N 0? .M00 M01 ˇ.ˇ 0 M11 ˇ/ 1 ˇ 0 M10 /˛N ?
1.
The solution gives eigenvalues 1 > 1 > > s > 0 and eigenvectors .v1 ; : : : ; vs /. Then, the ML estimators are O D .v1 ; : : : ; vs / O D M˛? ˇ? :ˇ O The likelihood ratio test for the reduced rank model Hr;s with rank s in the model Hr;k is given by
Qr;s D
T
k Xr
log.1
i /; s D 0; : : : ; k
r
1
i DsC1
The following statements compute the rank test to test for cointegrated order 2:
r
D Hr0
2172 F Chapter 32: The VARMAX Procedure
proc varmax data=simul2; model y1 y2 / p=2 cointtest=(johansen=(iorder=2)); run;
The last two columns in Figure 32.60 explain the cointegration rank test with integrated order 1. The results indicate that there is the cointegrated relationship with the cointegration rank 1 with respect to the 0.05 significance level because the test statistic of 0.5552 is smaller than the critical value of 3.84. Now, look at the row associated with r D 1. Compare the test statistic value, 211.84512, to the critical value, 3.84, for the cointegrated order 2. There is no evidence that the series are integrated order 2 at the 0.05 significance level. Figure 32.60 Cointegrated I(2) Test (IORDER= Option) The VARMAX Procedure Cointegration Rank Test for I(2)
r\k-r-s 0 1 5% CV I(2)
2
1
720.40735
308.69199 211.84512 3.84000
15.34000
Trace of I(1) 61.7522 0.5552
5% CV of I(1) 15.34 3.84
Multivariate GARCH Modeling Stochastic volatility modeling is important in many areas, particularly in finance. To study the volatility of time series, GARCH models are widely used because they provide a good approach to conditional variance modeling.
BEKK Representation Engle and Kroner (1995) propose a general multivariate GARCH model and call it a BEKK representation. Let F.t 1/ be the sigma field generated by the past values of t , and let Ht be the conditional covariance matrix of the k-dimensional random vector t . Let Ht be measurable with respect to F.t 1/; then the multivariate GARCH model can be written as t jF.t
1/ N.0; Ht / q X Ht D C C A0i t
0 i t i Ai
C
i D1
where C , Ai and Gi are k k parameter matrices.
p X i D1
Gi0 Ht
i Gi
Multivariate GARCH Modeling F 2173
Consider a bivariate GARCH(1,1) model as follows: 0 2 1;t a11 a12 c11 c12 1 C D 2;t 1 1;t a21 a22 c12 c22 0 g11 g12 g11 g12 C Ht 1 g21 g22 g21 g22
Ht
1;t 1 2;t 2 2;t 1
1
1
a11 a12 a21 a22
or, representing the univariate model, h11;t
2 2 D c11 C a11 1;t 2 Cg11 h11;t 1
h12;t
D c12 C
1
C 2g11 g21 h12;t
2 a11 a12 1;t 1
Cg11 g12 h11;t h22;t
D c22 C
C 2a11 a21 1;t 1
C
2 2 C a21 2;t
C .a21 a12 C a11 a22 /1;t
C 2a12 a22 1;t
C 2g12 g22 h12;t
1 2;t 1 1
C
1
2 g21 h22;t 1
1 C .g21 g12 C g11 g22 /h12;t
2 2 1;t 1 a12
2 h11;t 1 Cg12
1 2;t 1
C
1 2;t 1
2 C a21 a22 2;t
1 C g21 g22 h22;t
1
1
2 2 a22 2;t 1
2 g22 h22;t 1
For the BEKK representation of the bivariate GARCH(1,1) model, the SAS statements are: model y1 y2; garch q=1 p=1 form=bekk;
CCC Representation Bollerslev (1990) propose a multivariate GARCH model with time-varying conditional variances and covariances but constant conditional correlations. The conditional covariance matrix Ht consists of Ht D Dt Dt where Dt is a k k stochastic diagonal matrix with element i t and is a k k time-invariant matrix with the typical element ij . The elements of Ht are
hi i;t
D ci C
q X lD1
hij;t
2 ai i;l i;t l
C
p X lD1
D ij .hi i;t hjj;t /1=2 i ¤ j
gi i;l hi i;t
l
i; j D 1; : : : k
2174 F Chapter 32: The VARMAX Procedure
Estimation of GARCH Model The log-likelihood function of the multivariate GARCH model is written without a constant term T
1X Œlog jHt j C t0 Ht 1 t 2
`D
t D1
The log-likelihood function is maximized by an iterative numerical method such as quasi-Newton optimization. The starting values for the regression parameters are obtained from the least squares estimates. The covariance of t is used as the starting values for the GARCH constant parameters, and the starting value used for the other GARCH parameters is either 10 6 or 10 3 depending on the GARCH models representation. For the identification of the parameters of a BEKK representation GARCH model, the diagonal elements of the GARCH constant, the ARCH, and the GARCH parameters are restricted to be positive.
Covariance Stationarity Define the multivariate GARCH process as ht D
1 X
G.B/i
1
Œc C A.B/t
i D1
where ht D vec.Ht /, c D vec.C0 /, and t D vec.t t0 /. This representation is equivalent to a GARCH(p; q) model by the following algebra:
ht
D c C A.B/t C
1 X
G.B/i
1
Œc C A.B/t
i D2
D c C A.B/t C G.B/
1 X
G.B/i
1
Œtmbc C A.B/t
i D1
D c C A.B/t C G.B/ht Defining A.B/ D tion.
Pq
i D1 .Ai
˝ Ai /0 B i and G.B/ D
Pp
i D1 .Gi
˝ Gi /0 B i gives a BEKK representa-
The necessary and sufficient conditions for covariance stationarity of the multivariate GARCH process is that all the eigenvalues of A.1/ C G.1/ are less than one in modulus.
Multivariate GARCH Modeling F 2175
An Example of a VAR(1)–ARCH(1) Model The following DATA step simulates a bivariate vector time series to provide test data for the multivariate GARCH model: data garch; retain seed 16587; esq1 = 0; esq2 = 0; ly1 = 0; ly2 = 0; do i = 1 to 1000; ht = 6.25 + 0.5*esq1; call rannor(seed,ehat); e1 = sqrt(ht)*ehat; ht = 1.25 + 0.7*esq2; call rannor(seed,ehat); e2 = sqrt(ht)*ehat; y1 = 2 + 1.2*ly1 - 0.5*ly2 + e1; y2 = 4 + 0.6*ly1 + 0.3*ly2 + e2; if i>500 then output; esq1 = e1*e1; esq2 = e2*e2; ly1 = y1; ly2 = y2; end; keep y1 y2; run;
The following statements fit a VAR(1)–ARCH(1) model to the data. For a VAR-ARCH model, you specify the order of the autoregressive model with the P=1 option in the MODEL statement and the Q=1 option in the GARCH statement. In order to produce the initial and final values of parameters, the TECH=QN option is specified in the NLOPTIONS statement. proc varmax data=garch; model y1 y2 / p=1 print=(roots estimates diagnose); garch q=1; nloptions tech=qn; run;
Figure 32.61 through Figure 32.65 show the details of this example. Figure 32.61 shows the initial values of parameters.
2176 F Chapter 32: The VARMAX Procedure
Figure 32.61 Start Parameter Estimates for the VAR(1)–ARCH(1) Model The VARMAX Procedure Optimization Start Parameter Estimates
Estimate
Gradient Objective Function
2.249575 3.902673 1.231775 0.576890 -0.528405 0.343714 9.929763 0.193163 4.063245 0.001000 0 0 0.001000
5.787988 -4.856056 -17.155796 23.991176 14.656979 -12.763695 -0.111361 -0.684986 0.139403 -0.668058 -0.068657 -0.735896 -3.126628
N Parameter 1 2 3 4 5 6 7 8 9 10 11 12 13
CONST1 CONST2 AR1_1_1 AR1_2_1 AR1_1_2 AR1_2_2 GCHC1_1 GCHC1_2 GCHC2_2 ACH1_1_1 ACH1_2_1 ACH1_1_2 ACH1_2_2
Figure 32.62 shows the final parameter estimates. Figure 32.62 Results of Parameter Estimates for the VAR(1)–ARCH(1) Model The VARMAX Procedure Optimization Results Parameter Estimates N Parameter Estimate 1 2 3 4 5 6 7 8 9 10 11 12 13
CONST1 CONST2 AR1_1_1 AR1_2_1 AR1_1_2 AR1_2_2 GCHC1_1 GCHC1_2 GCHC2_2 ACH1_1_1 ACH1_2_1 ACH1_1_2 ACH1_2_2
1.943991 4.073898 1.220945 0.608263 -0.527121 0.303012 8.359045 -0.182483 1.602739 0.377569 0.032158 0.056491 0.710023
Multivariate GARCH Modeling F 2177
Figure 32.63 shows the conditional variance using the BEKK representation of the ARCH(1) model. The ARCH parameters are estimated by the vectorized parameter matrices. t jF.t
1/ N.0; Ht / 8:35905 0:18250 Ht D 0:18250 1:60275 0 0:37757 0:05649 C 0 0:03216 0:71002 t 1 t
1
0:37757 0:05649 0:03216 0:71002
Figure 32.63 ARCH(1) Parameter Estimates for the VAR(1)–ARCH(1) Model The VARMAX Procedure Type of Model Estimation Method Representation Type
VAR(1)-ARCH(1) Maximum Likelihood Estimation BEKK
GARCH Model Parameter Estimates
Parameter
Estimate
Standard Error
t Value
Pr > |t|
GCHC1_1 GCHC1_2 GCHC2_2 ACH1_1_1 ACH1_2_1 ACH1_1_2 ACH1_2_2
8.35905 -0.18248 1.60274 0.37757 0.03216 0.05649 0.71002
0.73116 0.21706 0.19398 0.07470 0.06971 0.02622 0.06844
11.43 -0.84 8.26 5.05 0.46 2.15 10.37
0.0001 0.4009 0.0001 0.0001 0.6448 0.0317 0.0001
Figure 32.64 shows the AR parameter estimates and their significance. The fitted VAR(1) model with the previous conditional covariance ARCH model is written as follows: 1:94399 1:22094 0:52712 yt D C yt 1 C t 4:07390 0:60826 0:30301
2178 F Chapter 32: The VARMAX Procedure
Figure 32.64 VAR(1) Parameter Estimates for the VAR(1)–ARCH(1) Model Model Parameter Estimates
Equation Parameter
Estimate
y1
1.94399 1.22095 -0.52712 4.07390 0.60826 0.30301
CONST1 AR1_1_1 AR1_1_2 CONST2 AR1_2_1 AR1_2_2
y2
Standard Error t Value Pr > |t| Variable 0.21017 0.02564 0.02836 0.10574 0.01231 0.01498
9.25 47.63 -18.59 38.53 49.42 20.23
0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
1 y1(t-1) y2(t-1) 1 y1(t-1) y2(t-1)
Figure 32.65 shows the roots of the AR and ARCH characteristic polynomials. The eigenvalues have a modulus less than one. Figure 32.65 Roots for the VAR(1)–ARCH(1) Model Roots of AR Characteristic Polynomial Index
Real
Imaginary
Modulus
Radian
Degree
1 2
0.76198 0.76198
0.33163 -0.33163
0.8310 0.8310
0.4105 -0.4105
23.5197 -23.5197
Roots of GARCH Characteristic Polynomial Index
Real
Imaginary
Modulus
Radian
Degree
1 2 3 4
0.51180 0.26627 0.26627 0.13853
0.00000 0.00000 0.00000 0.00000
0.5118 0.2663 0.2663 0.1385
0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000
Output Data Sets The VARMAX procedure can create the OUT=, OUTEST=, OUTHT=, and OUTSTAT= data sets. In general, if processing fails, the output is not recorded or is set to missing in the relevant output data set, and appropriate error and/or warning messages are recorded in the log.
OUT= Data Set The OUT= data set contains the forecast values produced by the OUTPUT statement. The following output variables can be created:
OUT= Data Set F 2179
the BY variables the ID variable the MODEL statement dependent (endogenous) variables. These variables contain the actual values from the input data set. FORi, numeric variables that contain the forecasts. The FORi variables contain the forecasts for the ith endogenous variable in the MODEL statement list. Forecasts are one-step-ahead predictions until the end of the data or until the observation specified by the BACK= option. Multistep forecasts can be computed after that point based on the LEAD= option. RESi, numeric variables that contain the residual for the forecast of the ith endogenous variable in the MODEL statement list. For multistep forecast observations, the actual values are missing and the RESi variables contain missing values. STDi, numeric variables that contain the standard deviation for the forecast of the ith endogenous variable in the MODEL statement list. The values of the STDi variables can be used to construct univariate confidence limits for the corresponding forecasts. LCIi, numeric variables that contain the lower confidence limits for the corresponding forecasts of the ith endogenous variable in the MODEL statement list. UCIi, numeric variables that contain the upper confidence limits for the corresponding forecasts of the ith endogenous variable in the MODEL statement list. The OUT= data set contains the values shown in Table 32.3 and Table 32.4 for a bivariate case. Table 32.3
Table 32.4
OUT= Data Set
Obs
ID variable
y1
FOR1
RES1
STD1
LCI1
UCI1
1 2 :: :
date date
y11 y12
f11 f12
r11 r12
11 11
l11 l12
u11 u12
OUT= Data Set Continued
Obs
y2
FOR2
RES2
STD2
LCI2
UCI2
1 2 :: :
y21 y22
f21 f22
r21 r22
22 22
l21 l22
u21 u22
Consider the following example:
2180 F Chapter 32: The VARMAX Procedure
proc varmax data=simul1 noprint; id date interval=year; model y1 y2 / p=1 noint; output out=out lead=5; run; proc print data=out(firstobs=98); run;
The output in Figure 32.66 shows part of the results of the OUT= data set for the preceding example. Figure 32.66 OUT= Data Set Obs
date
y1
FOR1
RES1
STD1
LCI1
UCI1
98 99 100 101 102 103 104 105
1997 1998 1999 2000 2001 2002 2003 2004
-0.58433 -2.07170 -3.38342 . . . . .
-0.13500 -1.00649 -2.58612 -3.59212 -3.09448 -2.17433 -1.11395 -0.14342
-0.44934 -1.06522 -0.79730 . . . . .
1.13523 1.13523 1.13523 1.13523 1.70915 2.14472 2.43166 2.58740
-2.36001 -3.23150 -4.81113 -5.81713 -6.44435 -6.37792 -5.87992 -5.21463
2.09002 1.21853 -0.36111 -1.36711 0.25539 2.02925 3.65203 4.92779
Obs
y2
FOR2
RES2
98 99 100 101 102 103 104 105
0.64397 0.35925 -0.64999 . . . . .
-0.34932 -0.07132 -0.99354 -2.09873 -2.77050 -2.75724 -2.24943 -1.47460
0.99329 0.43057 0.34355 . . . . .
STD2 1.19096 1.19096 1.19096 1.19096 1.47666 1.74212 2.01925 2.25169
LCI2 -2.68357 -2.40557 -3.32779 -4.43298 -5.66469 -6.17173 -6.20709 -5.88782
UCI2 1.98492 2.26292 1.34070 0.23551 0.12369 0.65725 1.70823 2.93863
OUTEST= Data Set The OUTEST= data set contains estimation results of the fitted model produced by the VARMAX statement. The following output variables can be created: the BY variables NAME, a character variable that contains the name of endogenous (dependent) variables or the name of the parameters for the covariance of the matrix of the parameter estimates if the OUTCOV option is specified TYPE, a character variable that contains the value EST for parameter estimates, the value STD for standard error of parameter estimates, and the value COV for the covariance of the matrix of the parameter estimates if the OUTCOV option is specified
OUTEST= Data Set F 2181
CONST, a numeric variable that contains the estimates of constant parameters and their standard errors SEASON_i , a numeric variable that contains the estimates of seasonal dummy parameters and their standard errors, where i D 1; : : : ; .nseason 1/, and nseason is based on the NSEASON= option LTREND, a numeric variable that contains the estimates of linear trend parameters and their standard errors QTREND, a numeric variable that contains the estimates of quadratic trend parameters and their standard errors XLl_i , numeric variables that contain the estimates of exogenous parameters and their standard errors, where l is the lag lth coefficient matrix and i D 1; : : : ; r, where r is the number of exogenous variables ARl_i , numeric variables that contain the estimates of autoregressive parameters and their standard errors, where l is the lag lth coefficient matrix and i D 1; : : : ; k, where k is the number of endogenous variables MAl_i, numeric variables that contain the estimates of moving-average parameters and their standard errors, where l is the lag lth coefficient matrix and i D 1; : : : ; k, where k is the number of endogenous variables ACHl_i are numeric variables that contain the estimates of the ARCH parameters of the covariance matrix and their standard errors, where l is the lag lth coefficient matrix and i D 1; : : : ; k for BEKK and CCC representations, where k is the number of endogenous variables. GCHl_i are numeric variables that contain the estimates of the GARCH parameters of the covariance matrix and their standard errors, where l is the lag lth coefficient matrix and i D 1; : : : ; k for BEKK and CCC representations, where k is the number of endogenous variables. GCHC_i are numeric variables that contain the estimates of the constant parameters of the covariance matrix and their standard errors, where i D 1; : : : ; k for BEKK representation, k is the number of endogenous variables, and i D 1 for CCC representation. CCC_i are numeric variables that contain the estimates of the conditional constant correlation parameters for CCC representation where i D 2; : : : ; k. The OUTEST= data set contains the values shown Table 32.5 for a bivariate case. Table 32.5
OUTEST= Data Set
Obs
NAME
TYPE
CONST
AR1_1
AR1_2
AR2_1
AR2_2
1 2 3 4
y1
EST STD EST STD
ı1 se(ı1 ) ı2 se(ı2 )
1;11 se(1;11 ) 1;21 se(1;21 )
1;12 se(1;12 ) 1;22 se(1;22 )
2;11 se(2;11 ) 2;21 se(2;21 )
2;12 se(2;12 ) 2;22 se(2;22 )
y2
2182 F Chapter 32: The VARMAX Procedure
Consider the following example: proc varmax data=simul2 outest=est; model y1 y2 / p=2 noint ecm=(rank=1 normalize=y1) noprint; run; proc print data=est; run;
The output in Figure 32.67 shows the results of the OUTEST= data set. Figure 32.67 OUTEST= Data Set Obs
NAME
TYPE
AR1_1
AR1_2
AR2_1
AR2_2
1 2 3 4
y1
EST STD EST STD
-0.46680 0.04786 0.10667 0.05146
0.91295 0.09359 -0.20862 0.10064
-0.74332 0.04526 0.40493 0.04867
-0.74621 0.04769 -0.57157 0.05128
y2
OUTHT= Data Set The OUTHT= data set contains prediction of the fitted GARCH model produced by the GARCH statement. The following output variables can be created. the BY variables Hi _j , numeric variables that contain the prediction of covariance, where 1 i < j k, where k is the number of dependent variables The OUTHT= data set contains the values shown in Table 32.6 for a bivariate case. Table 32.6
OUTHT= Data Set
Obs
H1_1
H1_2
H2_2
1 2 :
h111 h112 :
h121 h122 :
h221 h222 :
Consider the following example of the OUTHT= option: proc varmax data=garch; model y1 y2 / p=1 print=(roots estimates diagnose);
OUTSTAT= Data Set F 2183
garch q=1 outht=ht; run; proc print data=ht(firstobs=495); run;
The output in Figure 32.68 shows the part of the OUTHT= data set. Figure 32.68 OUTHT= Data Set Obs
h1_1
h1_2
495 496 497 498 499 500
9.36568 8.46807 9.19686 8.40787 8.88429 8.60844
-1.10406 -0.17464 0.09762 -0.33463 0.03646 -0.40260
h2_2 2.44644 1.60330 1.69639 2.07687 1.69401 1.79703
OUTSTAT= Data Set The OUTSTAT= data set contains estimation results of the fitted model produced by the VARMAX statement. The following output variables can be created. The subindex i is 1; : : : ; k, where k is the number of endogenous variables. the BY variables NAME, a character variable that contains the name of endogenous (dependent) variables SIGMA_i, numeric variables that contain the estimate of the innovation covariance matrix AICC, a numeric variable that contains the corrected Akaike’s information criterion value HQC, a numeric variable that contains the Hannan-Quinn’s information criterion value AIC, a numeric variable that contains the Akaike’s information criterion value SBC, a numeric variable that contains the Schwarz Bayesian’s information criterion value FPEC, a numeric variable that contains the final prediction error criterion value FValue, a numeric variable that contains the F statistics PValue, a numeric variable that contains p-value for the F statistics If the JOHANSEN= option is specified, the following items are added: Eigenvalue, a numeric variable that contains eigenvalues for the cointegration rank test of integrated order 1
2184 F Chapter 32: The VARMAX Procedure
RestrictedEigenvalue, a numeric variable that contains eigenvalues for the cointegration rank test of integrated order 1 when the NOINT option is not specified Beta_i , numeric variables that contain long-run effect parameter estimates, ˇ Alpha_i , numeric variables that contain adjustment parameter estimates, ˛ If the JOHANSEN=(IORDER=2) option is specified, the following items are added: EValueI2_i, numeric variables that contain eigenvalues for the cointegration rank test of integrated order 2 EValueI1, a numeric variable that contains eigenvalues for the cointegration rank test of integrated order 1 Eta_i, numeric variables that contain the parameter estimates in integrated order 2, Xi_i , numeric variables that contain the parameter estimates in integrated order 2, The OUTSTAT= data set contains the values shown Table 32.7 for a bivariate case. Table 32.7
OUTSTAT= Data Set
Obs
NAME
SIGMA_1
SIGMA_2
AICC
RSquare
FValue
PValue
1 2
y1 y2
11 21
12 22
aicc .
R12 R22
F1 F2
prob1 prob2
Obs
EValueI2_1
EValueI2_2
EValueI1
Beta_1
Beta_2
1 2
e11 e21
e12 .
e1 e2
ˇ11 ˇ21
ˇ12 ˇ21
Obs
Alpha_1
Alpha_2
Eta_1
Eta_2
Xi_1
Xi_2
1 2
˛11 ˛21
˛12 ˛22
11 21
12 22
11 21
12 22
Consider the following example: proc varmax data=simul2 outstat=stat; model y1 y2 / p=2 noint cointtest=(johansen=(iorder=2)) ecm=(rank=1 normalize=y1) noprint; run; proc print data=stat; run;
The output in Figure 32.69 shows the results of the OUTSTAT= data set.
Printed Output F 2185
Figure 32.69 OUTSTAT= Data Set Obs
NAME
SIGMA_1
SIGMA_2
AICC
HQC
AIC
SBC
FPEC
1 2
y1 y2
94.7557 4.5268
4.527 109.570
9.37221 .
9.43236 .
9.36834 .
9.52661 .
11712.14 .
Obs
RSquare
FValue
PValue
EValue I2_1
EValue I2_2
EValue I1
Beta_1
Beta_2
1 2
0.93900 0.93912
482.308 483.334
6.1637E-57 5.6124E-57
0.98486 0.81451
0.95079 .
0.50864 0.01108
1.00000 -1.95575
1.00000 -1.33622
Obs
Alpha_1
Alpha_2
Eta_1
Eta_2
Xi_1
Xi_2
1 2
-0.46680 0.10667
0.007937 0.033530
-0.012307 0.015555
0.027030 0.023086
54.1606 -79.4240
-52.3144 -18.3308
Printed Output The default printed output produced by the VARMAX procedure is described in the following list: descriptive statistics, which include the number of observations used, the names of the variables, their means and standard deviations (STD), their minimums and maximums, the differencing operations used, and the labels of the variables a type of model to fit the data and an estimation method a table of parameter estimates that shows the following for each parameter: the variable name for the left-hand side of equation, the parameter name, the parameter estimate, the approximate standard error, t value, the approximate probability (P r > jtj), and the variable name for the right-hand side of equations in terms of each parameter the innovation covariance matrix the information criteria If PRINT=ESTIMATES is specified, the VARMAX procedure prints the following list with the default printed output: the estimates of the constant vector (or seasonal constant matrix), the trend vector, the coefficient matrices of the distributed lags, the AR coefficient matrices, and the MA coefficient matrices the ALPHA and BETA parameter estimates for the error correction model the schematic representation of parameter estimates
2186 F Chapter 32: The VARMAX Procedure
If PRINT=DIAGNOSE is specified, the VARMAX procedure prints the following list with the default printed output: the cross-covariance and cross-correlation matrices of the residuals the tables of test statistics for the hypothesis that the residuals of the model are white noise: – Durbin-Watson (DW) statistics – F test for autoregressive conditional heteroscedastic (ARCH) disturbances – F test for AR disturbance – Jarque-Bera normality test – Portmanteau test
ODS Table Names The VARMAX procedure assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table: Table 32.8
ODS Tables Produced in the VARMAX Procedure
ODS Table Name
Description
Option
ODS Tables Created by the MODEL Statement IMPULSE=(ACCUM) IMPULSE=(ALL) AccumImpulsebyVar Accumulated impulse response by vari- IMPULSE=(ACCUM) able IMPULSE=(ALL) AccumImpulseX Accumulated transfer function matrices IMPULSX=(ACCUM) IMPULSX=(ALL) AccumImpulseXbyVar Accumulated transfer function by vari- IMPULSX=(ACCUM) able IMPULSX=(ALL) Alpha ˛ coefficients JOHANSEN= AlphaInECM ˛ coefficients when rank=r ECM= AlphaOnDrift ˛ coefficients under the restriction of a JOHANSEN= deterministic term AlphaBetaInECM … D ˛ˇ 0 coefficients when rank=r ECM= ANOVA Univariate model diagnostic checks for PRINT=DIAGNOSE the residuals ARCoef AR coefficients P= ARRoots Roots of AR characteristic polynomial ROOTS with P= Beta ˇ coefficients JOHANSEN= BetaInECM ˇ coefficients when rank=r ECM= BetaOnDrift ˇ coefficients under the restriction of a JOHANSEN= deterministic term AccumImpulse
Accumulated impulse response matrices
ODS Table Names F 2187
Table 32.8
continued
ODS Table Name
Description
Option
Constant CorrB CorrResiduals CorrResidualsbyVar CorrResidualsGraph
Constant estimates Correlations of parameter estimates Correlations of residuals Correlations of residuals by variable Schematic representation of correlations of residuals Schematic representation of sample correlations of independent series Schematic representation of sample correlations of dependent series Correlations of independent series Correlations of independent series by variable Correlations of dependent series Correlations of dependent series by variable Covariances of parameter estimates Covariances of the innovations Covariance matrices of the prediction error Covariances of the prediction error by variable Covariances of residuals Covariances of residuals by variable Covariances of independent series Covariances of independent series by variable Covariances of dependent series Covariances of dependent series by variable Decomposition of the prediction error covariances Decomposition of the prediction error covariances by variable Dickey-Fuller test Test the AR disturbance for the residuals Test the ARCH disturbance and normality for the residuals AR coefficients of the dynamic model Constant estimates of the dynamic model Covariances of the innovations of the dynamic model Linear trend estimates of the dynamic model MA coefficients of the dynamic model
without NOINT CORRB PRINT=DIAGNOSE PRINT=DIAGNOSE PRINT=DIAGNOSE
CorrXGraph CorrYGraph CorrXLags CorrXbyVar CorrYLags CorrYbyVar CovB CovInnovation CovPredictError CovPredictErrorbyVar CovResiduals CovResidualsbyVar CovXLags CovXbyVar CovYLags CovYbyVar DecomposeCov- PredictError DecomposeCov- PredictErrorbyVar DFTest DiagnostAR DiagnostWN DynamicARCoef DynamicConstant DynamicCov- Innovation DynamicLinearTrend DynamicMACoef
CORRX CORRY CORRX CORRX CORRY CORRY COVB default COVPE COVPE PRINT=DIAGNOSE PRINT=DIAGNOSE COVX COVX COVY COVY DECOMPOSE DECOMPOSE DFTEST PRINT=DIAGNOSE PRINT=DIAGNOSE DYNAMIC DYNAMIC DYNAMIC DYNAMIC DYNAMIC
2188 F Chapter 32: The VARMAX Procedure
Table 32.8
continued
ODS Table Name
Description
DynamicSConstant
Seasonal constant estimates of the dynamic model DynamicParameter- Parameter estimates table of the dynamic Estimates model DynamicParameter- Schematic representation of the parameGraph ters of the dynamic model DynamicQuadTrend Quadratic trend estimates of the dynamic model DynamicSeasonGraph Schematic representation of the seasonal dummies of the dynamic model DynamicXLagCoef Dependent coefficients of the dynamic model Hypothesis Hypothesis of different deterministic terms in cointegration rank test HypothesisTest Test hypothesis of different deterministic terms in cointegration rank test EigenvalueI2 Eigenvalues in integrated order 2 Eta
coefficients
InfiniteARRepresent InfoCriteria LinearTrend MACoef MARoots MaxTest
Infinite order ar representation Information criteria Linear trend estimates MA coefficients Roots of MA characteristic polynomial Cointegration rank test using the maximum eigenvalue Tentative order selection Type of model Number of observations Orthogonalized impulse response matrices Orthogonalized impulse response by variable Parameter estimates table Schematic representation of the parameters Partial autoregression matrices Schematic representation of partial autoregression Partial canonical correlation analysis Partial cross-correlation matrices Partial cross-correlations by variable Schematic representation of partial crosscorrelations
Minic ModelType NObs OrthoImpulse OrthoImpulsebyVar ParameterEstimates ParameterGraph PartialAR PartialARGraph PartialCanCorr PartialCorr PartialCorrbyVar PartialCorrGraph
Option DYNAMIC DYNAMIC DYNAMIC DYNAMIC DYNAMIC DYNAMIC JOHANSEN= JOHANSEN= JOHANSEN= (IORDER=2) JOHANSEN= (IORDER=2) IARR default TREND= Q= ROOTS with Q= JOHANSEN= (TYPE=MAX) MINIC MINIC= default default IMPULSE=(ORTH) IMPULSE=(ALL) IMPULSE=(ORTH) IMPULSE=(ALL) default PRINT=ESTIMATES PARCOEF PARCOEF PCANCORR PCORR PCORR PCORR
ODS Table Names F 2189
Table 32.8
continued
ODS Table Name
Description
PortmanteauTest
Chi-square test table for residual crosscorrelations ProportionCov- Pre- Proportions of prediction error covaridictError ance decomposition ProportionCov- Pre- Proportions of prediction error covaridictErrorbyVar ance decomposition by variable RankTestI2 Cointegration rank test in integrated order 2 RestrictMaxTest Cointegration rank test using the maximum eigenvalue under the restriction of a deterministic term RestrictTraceTest Cointegration rank test using the trace under the restriction of a deterministic term QuadTrend Quadratic trend estimates SeasonGraph Schematic representation of the seasonal dummies SConstant Seasonal constant estimates SimpleImpulse Impulse response matrices SimpleImpulsebyVar
Impulse response by variable
SimpleImpulseX
Impulse response matrices of transfer function SimpleImpulseXbyVar Impulse response of transfer function by variable Summary Simple summary statistics SWTest Common trends test TraceTest Cointegration rank test using the trace Xi
coefficient matrix
XLagCoef YWEstimates
Dependent coefficients Yule-Walker estimates
Option PRINT=DIAGNOSE DECOMPOSE DECOMPOSE JOHANSEN= (IORDER=2) JOHANSEN= (TYPE=MAX) without NOINT JOHANSEN= (TYPE=TRACE) without NOINT TREND=QUAD PRINT=ESTIMATES NSEASON= IMPULSE=(SIMPLE) IMPULSE=(ALL) IMPULSE=(SIMPLE) IMPULSE=(ALL) IMPULSX=(SIMPLE) IMPULSX=(ALL) IMPULSX=(SIMPLE) IMPULSX=(ALL) default SW= JOHANSEN= (TYPE=TRACE) JOHANSEN= (IORDER=2) XLAG= YW
ODS Tables Created by the GARCH Statement ARCHCoef GARCHCoef GARCHConstant GARCHParameterEstimates GARCHParameterGraph
ARCH coefficients GARCH coefficients GARCH constant estimates GARCH parameter estimates table
Q= P= PRINT=ESTIMATES default
Schematic representation of the garch pa- PRINT=ESTIMATES rameters
2190 F Chapter 32: The VARMAX Procedure
Table 32.8
continued
ODS Table Name
Description
Option
GARCHRoots
Roots of GARCH characteristic polyno- ROOTS mial
ODS Tables Created by the COINTEG Statement or the ECM option AlphaInECM AlphaBetaInECM AlphaOnAlpha AlphaOnBeta AlphaTestResults BetaInECM BetaOnBeta BetaOnAlpha BetaTestResults GrangerRepresent HMatrix JMatrix WeakExogeneity
˛ coefficients when rank=r … D ˛ˇ 0 coefficients when rank=r ˛ coefficients under the restriction of ˛ ˛ coefficients under the restriction of ˇ Hypothesis testing of ˇ ˇ coefficients when rank=r ˇ coefficients under the restriction of ˇ ˇ coefficients under the restriction of ˛ Hypothesis testing of ˇ Coefficient of Granger representation Restriction matrix for ˇ Restriction matrix for ˛ Testing weak exogeneity of each dependent variable with respect to BETA
PRINT=ESTIMATES PRINT=ESTIMATES J= H= J= PRINT=ESTIMATES H= J= H= PRINT=ESTIMATES H= J= EXOGENEITY
ODS Tables Created by the CAUSAL Statement CausalityTest GroupVars
Granger causality test Two groups of variables
default default
ODS Tables Created by the RESTRICT Statement Restrict
Restriction table
default
ODS Tables Created by the TEST Statement Test
Wald test
default
ODS Tables Created by the OUTPUT Statement Forecasts
Forecasts table
without NOPRINT
Note that the ODS table names suffixed by “byVar” can be obtained with the PRINTFORM=UNIVARIATE option.
ODS Graphics F 2191
ODS Graphics This section describes the use of ODS for creating statistical graphs with the VARMAX procedure. To request these graphs, you must specify the ODS GRAPHICS ON statement. When ODS GRAPHICS are in effect, the VARMAX procedure produces a variety of plots for each dependent variable. The plots available are as follows: The procedure displays the following plots for each dependent variable in the MODEL statement with the PLOT= option in the VARMAX statement: – impulse response function – impulse response of the transfer function – time series and predicted series – prediction errors – distribution of the prediction errors – normal quantile of the prediction errors – ACF of the prediction errors – PACF of the prediction errors – IACF of the prediction errors – log scaled white noise test of the prediction errors The procedure displays forecast plots for each dependent variable in the OUTPUT statement with the PLOT= option in the VARMAX statement.
ODS Graph Names The VARMAX procedure assigns a name to each graph it creates by using ODS. You can use these names to reference the graphs when using ODS. The names are listed in Table 32.9. Table 32.9
ODS Graphics Produced in the VARMAX Procedure
ODS Table Name
Plot Description
Statement
ErrorACFPlot
Autocorrelation function of prediction errors Inverse autocorrelation function of prediction errors Partial autocorrelation function of prediction errors Diagnostics of prediction errors Histogram and Q-Q plot of prediction errors
MODEL
ErrorIACFPlot ErrorPACFPlot ErrorDiagnosticsPanel ErrorNormalityPanel
MODEL MODEL MODEL MODEL
2192 F Chapter 32: The VARMAX Procedure
Table 32.9
continued
ODS Table Name
Plot Description
Statement
ErrorDistribution ErrorQQPlot ErrorWhiteNoisePlot ErrorPlot ModelPlot AccumulatedIRFPanel AccumulatedIRFXPanel
Distribution of prediction errors Q-Q plot of prediction errors White noise test of prediction errors Prediction errors Time series and predicted series Accumulated impulse response function Accumulated impulse response of transfer function Orthogonalized impulse response function Simple impulse response function Simple impulse response of transfer function Time series and forecasts Forecasts
MODEL MODEL MODEL MODEL MODEL MODEL MODEL
OrthogonalIRFPanel SimpleIRFPanel SimpleIRFXPanel ModelForecastsPlot ForecastsOnlyPlot
MODEL MODEL MODEL OUTPUT OUTPUT
Computational Issues Computational Method The VARMAX procedure uses numerous linear algebra routines and frequently uses the sweep operator (Goodnight 1979) and the Cholesky root (Golub and Van Loan 1983). In addition, the VARMAX procedure uses the nonlinear optimization (NLO) subsystem to perform nonlinear optimization tasks for the maximum likelihood estimation. The optimization requires intensive computation.
Convergence Problems For some data sets, the computation algorithm can fail to converge. Nonconvergence can result from a number of causes, including flat or ridged likelihood surfaces and ill-conditioned data. If you experience convergence problems, the following points might be helpful: Data that contain extreme values can affect results in PROC VARMAX. Rescaling the data can improve stability. Changing the TECH=, MAXITER=, and MAXFUNC= options in the NLOPTIONS statement can improve the stability of the optimization process. Specifying a different model that might fit the data more closely and might improve convergence.
Examples: VARMAX Procedure F 2193
Memory Let T be the length of each series, k be the number of dependent variables, p be the order of autoregressive terms, and q be the order of moving-average terms. The number of parameters to estimate for a VARMA(p; q) model is k C .p C q/k 2 C k .k C 1/=2 As k increases, the number of parameters to estimate increases very quickly. Furthermore the memory requirement for VARMA(p; q) quadratically increases as k and T increase. For a VARMAX(p; q; s) model and GARCH-type multivariate conditional heteroscedasticity models, the number of parameters to estimate and the memory requirements are considerable.
Computing Time PROC VARMAX is computationally intensive, and execution times can be long. Extensive CPU time is often required to compute the maximum likelihood estimates.
Examples: VARMAX Procedure
Example 32.1: Analysis of U.S. Economic Variables Consider the following four-dimensional system of U.S. economic variables. Quarterly data for the years 1954 to 1987 are used (Lütkepohl 1993, Table E.3.). title 'Analysis of U.S. Economic Variables'; data us_money; date=intnx( 'qtr', '01jan54'd, _n_-1 ); format date yyq. ; input y1 y2 y3 y4 @@; y1=log(y1); y2=log(y2); label y1='log(real money stock M1)' y2='log(GNP in bil. of 1982 dollars)' y3='Discount rate on 91-day T-bills' y4='Yield on 20-year Treasury bonds'; datalines; 450.9 1406.8 0.010800000 0.026133333 453.0 1401.2 0.0081333333 0.025233333 ... more lines ...
The following statements plot the series and proceed with the VARMAX procedure.
2194 F Chapter 32: The VARMAX Procedure
proc timeseries data=us_money vectorplot=series; id date interval=qtr; var y1 y2; run;
Output 32.1.1 shows the plot of the variables y1 and y2. Output 32.1.1 Plot of Data
The following statements plot the variables y3 and y4. proc timeseries data=us_money vectorplot=series; id date interval=qtr; var y3 y4; run;
Output 32.1.2 shows the plot of the variables y3 and y4.
Example 32.1: Analysis of U.S. Economic Variables F 2195
Output 32.1.2 Plot of Data
proc varmax data=us_money; id date interval=qtr; model y1-y4 / p=2 lagmax=6 dftest print=(iarr(3) estimates diagnose) cointtest=(johansen=(iorder=2)) ecm=(rank=1 normalize=y1); cointeg rank=1 normalize=y1 exogeneity; run;
This example performs the Dickey-Fuller test for stationarity, the Johansen cointegrated test integrated order 2, and the exogeneity test. The VECM(2) is fit to the data. From the outputs shown in Output 32.1.5, you can see that the series has unit roots and is cointegrated in rank 1 with integrated order 1. The fitted VECM(2) is given as 0 yt
1 0 0:0408 B 0:0860 C CCB A @ 0:0052 0:0144
B D B @ 0
0:3460 B 0:0994 CB @ 0:1812 0:0322
0:0913 0:0379 0:0786 0:0496
0:0140 0:0281 0:0022 0:0051 0:3535 0:2390 0:0223 0:0329
0:0065 0:0131 0:0010 0:0024
1 0:1306 0:2630 C Cy 0:0201 A t 0:0477
0:2026 0:4080 0:0312 0:0741 1
0:9690 0:2866 C C y 0:4051 A t 0:1857
1
C t
1
2196 F Chapter 32: The VARMAX Procedure
The prefixed to a variable name implies differencing. Output 32.1.3 through Output 32.1.14 show the details. Output 32.1.3 shows the descriptive statistics. Output 32.1.3 Descriptive Statistics Analysis of U.S. Economic Variables The VARMAX Procedure Number of Observations Number of Pairwise Missing
136 0
Simple Summary Statistics
N
Mean
Standard Deviation
Min
Max
136 136 136 136
6.21295 7.77890 0.05608 0.06458
0.07924 0.30110 0.03109 0.02927
6.10278 7.24508 0.00813 0.02490
6.45331 8.27461 0.15087 0.13600
Variable Type y1 y2 y3 y4
Dependent Dependent Dependent Dependent
Simple Summary Statistics Variable Label y1 y2 y3 y4
log(real money stock M1) log(GNP in bil. of 1982 dollars) Discount rate on 91-day T-bills Yield on 20-year Treasury bonds
Output 32.1.4 shows the output for Dickey-Fuller tests for the nonstationarity of each series. The null hypotheses is to test a unit root. All series have a unit root. Output 32.1.4 Unit Root Tests Unit Root Test Variable
Type
y1
Zero Mean Single Mean Trend Zero Mean Single Mean Trend Zero Mean Single Mean Trend Zero Mean Single Mean Trend
y2
y3
y4
Rho
Pr < Rho
Tau
Pr < Tau
0.05 -2.97 -5.91 0.13 -0.43 -9.21 -1.28 -8.86 -18.97 0.40 -2.79 -12.12
0.6934 0.6572 0.7454 0.7124 0.9309 0.4787 0.4255 0.1700 0.0742 0.7803 0.6790 0.2923
1.14 -0.76 -1.34 5.14 -0.79 -2.16 -0.69 -2.27 -2.86 0.45 -1.29 -2.33
0.9343 0.8260 0.8725 0.9999 0.8176 0.5063 0.4182 0.1842 0.1803 0.8100 0.6328 0.4170
Example 32.1: Analysis of U.S. Economic Variables F 2197
The Johansen cointegration rank test shows whether the series is integrated order either 1 or 2 as shown in Output 32.1.5. The last two columns in Output 32.1.5 explain the cointegration rank test with integrated order 1. The results indicate that there is the cointegrated relationship with the cointegration rank 1 with respect to the 0.05 significance level because the test statistic of 20.6542 is smaller than the critical value of 29.38. Now, look at the row associated with r D 1. Compare the test statistic value and critical value pairs such as (219.62395, 29.38), (89.21508, 15.34), and (27.32609, 3.84). There is no evidence that the series are integrated order 2 at the 0.05 significance level. Output 32.1.5 Cointegration Rank Test Cointegration Rank Test for I(2)
r\k-r-s 0 1 2 3 5% CV I(2)
4
3
2
1
384.60903
214.37904 219.62395
107.93782 89.21508 73.61779
47.21000
29.38000
15.34000
37.02523 27.32609 22.13279 38.29435 3.84000
Cointegration Rank Test for I(2)
r\k-r-s 0 1 2 3 5% CV I(2)
5% CV of I(1) 47.21 29.38 15.34 3.84
Trace of I(1) 55.9633 20.6542 2.6477 0.0149
2198 F Chapter 32: The VARMAX Procedure
Output 32.1.6 shows the estimates of the long-run parameter, ˇ, and the adjustment coefficient, ˛. Output 32.1.6 Cointegration Rank Test Continued Beta Variable y1 y2 y3 y4
1
2
3
4
1.00000 -0.46458 14.51619 -9.35520
1.00000 -0.63174 -1.29864 7.53672
1.00000 -0.69996 1.37007 2.47901
1.00000 -0.16140 -0.61806 1.43731
Alpha Variable y1 y2 y3 y4
1
2
3
4
-0.01396 -0.02811 -0.00215 0.00510
0.01396 -0.02739 -0.04967 -0.02514
-0.01119 -0.00032 -0.00183 -0.00220
0.00008 0.00076 -0.00072 0.00016
Output 32.1.7 shows the estimates and . Output 32.1.7 Cointegration Rank Test Continued Eta Variable y1 y2 y3 y4
1
2
3
4
52.74907 -49.10609 68.29674 121.25932
41.74502 -9.40081 -144.83173 271.80496
-20.80403 98.87199 -27.35953 85.85156
55.77415 22.56416 15.51142 -130.11599
Xi Variable y1 y2 y3 y4
1
2
3
4
-0.00842 0.00141 -0.00445 -0.00211
-0.00052 0.00213 0.00541 -0.00064
-0.00208 -0.00736 -0.00150 -0.00130
-0.00250 -0.00058 0.00310 0.00197
Example 32.1: Analysis of U.S. Economic Variables F 2199
Output 32.1.8 shows that the VECM(2) is fit to the data. The ECM=(RANK=1) option produces the estimates of the long-run parameter, ˇ, and the adjustment coefficient, ˛. Output 32.1.8 Parameter Estimates Analysis of U.S. Economic Variables The VARMAX Procedure Type of Model Estimation Method Cointegrated Rank
VECM(2) Maximum Likelihood Estimation 1
Beta Variable
1
y1 y2 y3 y4
1.00000 -0.46458 14.51619 -9.35520
Alpha Variable y1 y2 y3 y4
1 -0.01396 -0.02811 -0.00215 0.00510
2200 F Chapter 32: The VARMAX Procedure
Output 32.1.9 shows the parameter estimates in terms of the constant, the lag one coefficients (yt 1 ) contained in the ˛ˇ 0 estimates, and the coefficients associated with the lag one first differences (yt 1 ). Output 32.1.9 Parameter Estimates Continued Constant Variable y1 y2 y3 y4
Constant 0.04076 0.08595 0.00518 -0.01438
Parameter Alpha * Beta' Estimates Variable y1 y2 y3 y4
y1
y2
y3
y4
-0.01396 -0.02811 -0.00215 0.00510
0.00648 0.01306 0.00100 -0.00237
-0.20263 -0.40799 -0.03121 0.07407
0.13059 0.26294 0.02011 -0.04774
AR Coefficients of Differenced Lag DIF Lag 1
Variable y1 y2 y3 y4
y1
y2
y3
y4
0.34603 0.09936 0.18118 0.03222
0.09131 0.03791 0.07859 0.04961
-0.35351 0.23900 0.02234 -0.03292
-0.96895 0.28661 0.40508 0.18568
Example 32.1: Analysis of U.S. Economic Variables F 2201
Output 32.1.10 shows the parameter estimates and their significance. Output 32.1.10 Parameter Estimates Continued Model Parameter Estimates
Equation Parameter
Estimate
D_y1
0.04076 -0.01396 0.00648 -0.20263 0.13059 0.34603 0.09131 -0.35351 -0.96895 0.08595 -0.02811 0.01306 -0.40799 0.26294 0.09936 0.03791 0.23900 0.28661 0.00518 -0.00215 0.00100 -0.03121 0.02011 0.18118 0.07859 0.02234 0.40508 -0.01438 0.00510 -0.00237 0.07407 -0.04774 0.03222 0.04961 -0.03292 0.18568
D_y2
D_y3
D_y4
CONST1 AR1_1_1 AR1_1_2 AR1_1_3 AR1_1_4 AR2_1_1 AR2_1_2 AR2_1_3 AR2_1_4 CONST2 AR1_2_1 AR1_2_2 AR1_2_3 AR1_2_4 AR2_2_1 AR2_2_2 AR2_2_3 AR2_2_4 CONST3 AR1_3_1 AR1_3_2 AR1_3_3 AR1_3_4 AR2_3_1 AR2_3_2 AR2_3_3 AR2_3_4 CONST4 AR1_4_1 AR1_4_2 AR1_4_3 AR1_4_4 AR2_4_1 AR2_4_2 AR2_4_3 AR2_4_4
Standard Error t Value Pr > |t| Variable 0.01418 0.00495 0.00230 0.07191 0.04634 0.06414 0.07334 0.11024 0.20737 0.01679 0.00586 0.00272 0.08514 0.05487 0.07594 0.08683 0.13052 0.24552 0.01608 0.00562 0.00261 0.08151 0.05253 0.07271 0.08313 0.12496 0.23506 0.00803 0.00281 0.00130 0.04072 0.02624 0.03632 0.04153 0.06243 0.11744
2.87
5.39 1.25 -3.21 -4.67 5.12
1.31 0.44 1.83 1.17 0.32
2.49 0.95 0.18 1.72 -1.79
0.89 1.19 -0.53 1.58
0.0048 1 y1(t-1) y2(t-1) y3(t-1) y4(t-1) 0.0001 D_y1(t-1) 0.2154 D_y2(t-1) 0.0017 D_y3(t-1) 0.0001 D_y4(t-1) 0.0001 1 y1(t-1) y2(t-1) y3(t-1) y4(t-1) 0.1932 D_y1(t-1) 0.6632 D_y2(t-1) 0.0695 D_y3(t-1) 0.2453 D_y4(t-1) 0.7476 1 y1(t-1) y2(t-1) y3(t-1) y4(t-1) 0.0140 D_y1(t-1) 0.3463 D_y2(t-1) 0.8584 D_y3(t-1) 0.0873 D_y4(t-1) 0.0758 1 y1(t-1) y2(t-1) y3(t-1) y4(t-1) 0.3768 D_y1(t-1) 0.2345 D_y2(t-1) 0.5990 D_y3(t-1) 0.1164 D_y4(t-1)
2202 F Chapter 32: The VARMAX Procedure
Output 32.1.11 shows the innovation covariance matrix estimates, the various information criteria results, and the tests for white noise residuals. The residuals have significant correlations at lag 2 and 3. The Portmanteau test results into significant. These results show that a VECM(3) model might be better fit than the VECM(2) model is. Output 32.1.11 Diagnostic Checks Covariances of Innovations Variable y1 y2 y3 y4
y1
y2
y3
y4
0.00005 0.00001 -0.00001 -0.00000
0.00001 0.00007 0.00002 0.00001
-0.00001 0.00002 0.00007 0.00002
-0.00000 0.00001 0.00002 0.00002
Information Criteria AICC HQC AIC SBC FPEC
-40.6284 -40.4343 -40.6452 -40.1262 2.23E-18
Schematic Representation of Cross Correlations of Residuals Variable/ Lag 0 1 2 3 4 5 6 y1 y2 y3 y4
++.. ++++ .+++ .+++
.... .... .... ....
+ is > 2*std error,
++.. .... +.-. ....
.... .... ..++ ..+.
+... .... -... ....
- is < -2*std error,
..-.... .... ....
. is between
Portmanteau Test for Cross Correlations of Residuals Up To Lag 3 4 5 6
DF
Chi-Square
Pr > ChiSq
16 32 48 64
53.90 74.03 103.08 116.94
2*std error, is < -2*std error, . is between, * is N/A
Model Parameter Estimates
Equation Parameter
Estimate
y1
-0.01672 -0.31963 0.14599 0.96122 -0.16055 0.11460 0.93439 0.01577 0.04393 -0.15273 0.28850 0.05003 0.01917 -0.01020 0.01293 -0.00242 0.22481 -0.26397 0.03388 0.35491 -0.02223
y2
y3
CONST1 AR1_1_1 AR1_1_2 AR1_1_3 AR2_1_1 AR2_1_2 AR2_1_3 CONST2 AR1_2_1 AR1_2_2 AR1_2_3 AR2_2_1 AR2_2_2 AR2_2_3 CONST3 AR1_3_1 AR1_3_2 AR1_3_3 AR2_3_1 AR2_3_2 AR2_3_3
Standard Error t Value Pr > |t| Variable 0.01723 0.12546 0.54567 0.66431 0.12491 0.53457 0.66510 0.00437 0.03186 0.13857 0.16870 0.03172 0.13575 0.16890 0.00353 0.02568 0.11168 0.13596 0.02556 0.10941 0.13612
-0.97 -2.55 0.27 1.45 -1.29 0.21 1.40 3.60 1.38 -1.10 1.71 1.58 0.14 -0.06 3.67 -0.09 2.01 -1.94 1.33 3.24 -0.16
0.3352 0.0132 0.7899 0.1526 0.2032 0.8309 0.1647 0.0006 0.1726 0.2744 0.0919 0.1195 0.8882 0.9520 0.0005 0.9251 0.0482 0.0565 0.1896 0.0019 0.8708
1 y1(t-1) y2(t-1) y3(t-1) y1(t-2) y2(t-2) y3(t-2) 1 y1(t-1) y2(t-1) y3(t-1) y1(t-2) y2(t-2) y3(t-2) 1 y1(t-1) y2(t-1) y3(t-1) y1(t-2) y2(t-2) y3(t-2)
Output 32.2.4 shows the innovation covariance matrix estimates, the various information criteria results, and the tests for white noise residuals. The residuals are uncorrelated except at lag 3 for y2 variable.
Example 32.2: Analysis of German Economic Variables F 2209
Output 32.2.4 Diagnostic Checks Covariances of Innovations Variable y1 y2 y3
y1
y2
y3
0.00213 0.00007 0.00012
0.00007 0.00014 0.00006
0.00012 0.00006 0.00009
Information Criteria AICC HQC AIC SBC FPEC
-24.4884 -24.2869 -24.5494 -23.8905 2.18E-11
Cross Correlations of Residuals Lag 0
1
2
3
Variable y1 y2 y3 y1 y2 y3 y1 y2 y3 y1 y2 y3
y1
y2
y3
1.00000 0.13242 0.28275 0.01461 -0.01125 -0.00993 0.07253 -0.08096 -0.02660 0.09915 -0.00289 -0.03364
0.13242 1.00000 0.55526 -0.00666 -0.00167 -0.06780 -0.00226 -0.01066 -0.01392 0.04484 0.14059 0.05374
0.28275 0.55526 1.00000 -0.02394 -0.04515 -0.09593 -0.01621 -0.02047 -0.02263 0.05243 0.25984 0.05644
Schematic Representation of Cross Correlations of Residuals Variable/ Lag 0 1 2 3 y1 y2 y3
+.+ .++ +++
... ... ...
... ... ...
... ..+ ...
+ is > 2*std error, - is < -2*std error, . is between
Portmanteau Test for Cross Correlations of Residuals Up To Lag 3
DF
Chi-Square
Pr > ChiSq
9
9.69
0.3766
2210 F Chapter 32: The VARMAX Procedure
Output 32.2.5 describes how well each univariate equation fits the data. The residuals are off from the normality, but have no AR effects. The residuals for y1 variable have the ARCH effect. Output 32.2.5 Diagnostic Checks Continued Univariate Model ANOVA Diagnostics
Variable y1 y2 y3
R-Square
Standard Deviation
F Value
Pr > F
0.1286 0.1142 0.2513
0.04615 0.01172 0.00944
1.62 1.42 3.69
0.1547 0.2210 0.0032
Univariate Model White Noise Diagnostics Durbin Watson
Variable y1 y2 y3
Normality Chi-Square Pr > ChiSq
1.96269 1.98145 2.14583
10.22 11.98 34.25
F Value
0.0060 0.0025 F
12.39 0.38 0.10
0.0008 0.5386 0.7480
Univariate Model AR Diagnostics
Variable y1 y2 y3
AR1 F Value Pr > F 0.01 0.00 0.68
0.9029 0.9883 0.4129
AR2 F Value Pr > F 0.19 0.00 0.38
0.8291 0.9961 0.6861
AR3 F Value Pr > F 0.39 0.46 0.30
0.7624 0.7097 0.8245
AR4 F Value Pr > F 1.39 0.34 0.21
0.2481 0.8486 0.9320
Example 32.2: Analysis of German Economic Variables F 2211
Output 32.2.6 is the output in a matrix format associated with the PRINT=(IMPULSE=) option for the impulse response function and standard errors. The y3 variable in the first row is an impulse variable. The y1 variable in the first column is a response variable. The numbers, 0.96122, 0.41555, –0.40789 at lag 1 to 3 are decreasing. Output 32.2.6 Impulse Response Function Simple Impulse Response by Variable Variable Response\Impulse y1
y2
y3
Lag
y1
y2
y3
1 STD 2 STD 3 STD 1 STD 2 STD 3 STD 1 STD 2 STD 3 STD
-0.31963 0.12546 -0.05430 0.12919 0.11904 0.08362 0.04393 0.03186 0.02858 0.03184 -0.00884 0.01583 -0.00242 0.02568 0.04517 0.02563 -0.00055 0.01646
0.14599 0.54567 0.26174 0.54728 0.35283 0.38489 -0.15273 0.13857 0.11377 0.13425 0.07147 0.07914 0.22481 0.11168 0.26088 0.10820 -0.09818 0.07823
0.96122 0.66431 0.41555 0.66311 -0.40789 0.47867 0.28850 0.16870 -0.08820 0.16250 0.11977 0.09462 -0.26397 0.13596 0.10998 0.13101 0.09096 0.10280
The proportions of decomposition of the prediction error covariances of three variables are given in Output 32.2.7. If you see the y3 variable in the first column, then the output explains that about 64.713% of the one-step-ahead prediction error covariances of the variable y3t is accounted for by its own innovations, about 7.995% is accounted for by y1t innovations, and about 27.292% is accounted for by y2t innovations.
2212 F Chapter 32: The VARMAX Procedure
Output 32.2.7 Proportions of Prediction Error Covariance Decomposition Proportions of Prediction Error Covariances by Variable Variable y1
y2
y3
Lead
y1
y2
y3
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
1.00000 0.95996 0.94565 0.94079 0.93846 0.93831 0.01754 0.06025 0.06959 0.06831 0.06850 0.06924 0.07995 0.07725 0.12973 0.12870 0.12859 0.12852
0.00000 0.01751 0.02802 0.02936 0.03018 0.03025 0.98246 0.90747 0.89576 0.89232 0.89212 0.89141 0.27292 0.27385 0.33364 0.33499 0.33924 0.33963
0.00000 0.02253 0.02633 0.02985 0.03136 0.03145 0.00000 0.03228 0.03465 0.03937 0.03938 0.03935 0.64713 0.64890 0.53663 0.53631 0.53217 0.53185
The table in Output 32.2.8 gives forecasts and their prediction error covariances. Output 32.2.8 Forecasts Forecasts
Variable y1
y2
y3
Obs
Time
Forecast
Standard Error
77 78 79 80 81 77 78 79 80 81 77 78 79 80 81
1979:1 1979:2 1979:3 1979:4 1980:1 1979:1 1979:2 1979:3 1979:4 1980:1 1979:1 1979:2 1979:3 1979:4 1980:1
6.54027 6.55105 6.57217 6.58452 6.60193 7.68473 7.70508 7.72206 7.74266 7.76240 7.54024 7.55489 7.57472 7.59344 7.61232
0.04615 0.05825 0.06883 0.08021 0.09117 0.01172 0.01691 0.02156 0.02615 0.03005 0.00944 0.01282 0.01808 0.02205 0.02578
95% Confidence Limits 6.44982 6.43688 6.43725 6.42732 6.42324 7.66176 7.67193 7.67980 7.69140 7.70350 7.52172 7.52977 7.53928 7.55022 7.56179
6.63072 6.66522 6.70708 6.74173 6.78063 7.70770 7.73822 7.76431 7.79392 7.82130 7.55875 7.58001 7.61015 7.63666 7.66286
Example 32.2: Analysis of German Economic Variables F 2213
Output 32.2.9 shows that you cannot reject Granger noncausality from .y2; y3/ to y1 using the 0.05 significance level. Output 32.2.9 Granger Causality Tests Granger-Causality Wald Test Test
DF
Chi-Square
Pr > ChiSq
1
4
6.37
0.1734
Test 1:
Group 1 Variables: Group 2 Variables:
y1 y2 y3
The following SAS statements show that the variable y1 is the exogenous variable and fit the VARX(2,1) model to the data. proc varmax data=use; id date interval=qtr; model y2 y3 = y1 / p=2 dify=(1) difx=(1) xlag=1 lagmax=3 print=(estimates diagnose); run;
The fitted VARX(2,1) model is written as
y 2t y 3t
0:01542 0:02520 0:03870 D C y 1t C y 1;t 0:01319 0:05130 0:00363 0:12258 0:25811 y 2;t 1 C 0:24367 0:31809 y 3;t 1 0:01651 0:03498 y 2;t 2 1t C C 0:34921 0:01664 y 3;t 2 2t
The detailed output is shown in Output 32.2.10 through Output 32.2.13.
1
2214 F Chapter 32: The VARMAX Procedure
Output 32.2.10 shows the parameter estimates in terms of the constant, the current and the lag one coefficients of the exogenous variable, and the lag two coefficients of the dependent variables. Output 32.2.10 Parameter Estimates Analysis of German Economic Variables The VARMAX Procedure Type of Model Estimation Method
VARX(2,1) Least Squares Estimation
Constant Variable
Constant
y2 y3
0.01542 0.01319
XLag Lag
Variable
0 1
y1
y2 y3 y2 y3
0.02520 0.05130 0.03870 0.00363
AR Lag 1 2
Variable y2 y3 y2 y3
y2
y3
-0.12258 0.24367 0.01651 0.34921
0.25811 -0.31809 0.03498 -0.01664
Example 32.2: Analysis of German Economic Variables F 2215
Output 32.2.11 shows the parameter estimates and their significance. Output 32.2.11 Parameter Estimates Continued Model Parameter Estimates
Equation Parameter
Estimate
y2
0.01542 0.02520 0.03870 -0.12258 0.25811 0.01651 0.03498 0.01319 0.05130 0.00363 0.24367 -0.31809 0.34921 -0.01664
y3
CONST1 XL0_1_1 XL1_1_1 AR1_1_1 AR1_1_2 AR2_1_1 AR2_1_2 CONST2 XL0_2_1 XL1_2_1 AR1_2_1 AR1_2_2 AR2_2_1 AR2_2_2
Standard Error t Value Pr > |t| Variable 0.00443 0.03130 0.03252 0.13903 0.17370 0.13766 0.16783 0.00346 0.02441 0.02536 0.10842 0.13546 0.10736 0.13088
3.48 0.81 1.19 -0.88 1.49 0.12 0.21 3.81 2.10 0.14 2.25 -2.35 3.25 -0.13
0.0009 0.4237 0.2383 0.3811 0.1421 0.9049 0.8356 0.0003 0.0394 0.8868 0.0280 0.0219 0.0018 0.8992
1 y1(t) y1(t-1) y2(t-1) y3(t-1) y2(t-2) y3(t-2) 1 y1(t) y1(t-1) y2(t-1) y3(t-1) y2(t-2) y3(t-2)
2216 F Chapter 32: The VARMAX Procedure
Output 32.2.12 shows the innovation covariance matrix estimates, the various information criteria results, and the tests for white noise residuals. The residuals is uncorrelated except at lag 3 for y2 variable. Output 32.2.12 Diagnostic Checks Covariances of Innovations Variable y2 y3
y2
y3
0.00014 0.00006
0.00006 0.00009
Information Criteria AICC HQC AIC SBC FPEC
-18.3902 -18.2558 -18.4309 -17.9916 9.91E-9
Cross Correlations of Residuals Lag
Variable
0
y2 y3 y2 y3 y2 y3 y2 y3
1 2 3
y2
y3
1.00000 0.56462 -0.02312 -0.07056 -0.02849 -0.05804 0.16071 0.10882
0.56462 1.00000 -0.05927 -0.09145 -0.05262 -0.08567 0.29588 0.13002
Schematic Representation of Cross Correlations of Residuals Variable/ Lag 0 1 2 3 y2 y3
++ ++
.. ..
.. ..
.+ ..
+ is > 2*std error, - is < -2*std error, . is between
Portmanteau Test for Cross Correlations of Residuals Up To Lag 3
DF
Chi-Square
Pr > ChiSq
4
8.38
0.0787
Example 32.2: Analysis of German Economic Variables F 2217
Output 32.2.13 describes how well each univariate equation fits the data. The residuals are off from the normality, but have no ARCH and AR effects. Output 32.2.13 Diagnostic Checks Continued Univariate Model ANOVA Diagnostics
Variable y2 y3
R-Square
Standard Deviation
F Value
Pr > F
0.0897 0.2796
0.01188 0.00926
1.08 4.27
0.3809 0.0011
Univariate Model White Noise Diagnostics Durbin Watson
Variable y2 y3
Normality Chi-Square Pr > ChiSq
2.02413 2.13414
14.54 32.27
F Value
ARCH Pr > F
0.49 0.08
0.4842 0.7782
0.0007 F 0.04 0.62
0.8448 0.4343
AR2 F Value Pr > F 0.04 0.62
0.9570 0.5383
AR3 F Value Pr > F 0.62 0.72
0.6029 0.5452
AR4 F Value Pr > F 0.42 0.36
0.7914 0.8379
2218 F Chapter 32: The VARMAX Procedure
Example 32.3: Numerous Examples The following are examples of syntax for model fitting: /* Data 'a' Generated Process */ proc iml; sig = {1.0 0.5, 0.5 1.25}; phi = {1.2 -0.5, 0.6 0.3}; call varmasim(y,phi) sigma = sig n = 100 seed = 46859; cn = {'y1' 'y2'}; create a from y[colname=cn]; append from y; run;; /* when the series has a linear trend */ proc varmax data=a; model y1 y2 / p=1 trend=linear; run; /* Fit subset of AR order 1 and 3 */ proc varmax data=a; model y1 y2 / p=(1,3); run; /* Check if the series is nonstationary */ proc varmax data=a; model y1 y2 / p=1 dftest print=(roots); run; /* Fit VAR(1) in differencing */ proc varmax data=a; model y1 y2 / p=1 print=(roots) dify=(1); run; /* Fit VAR(1) in seasonal differencing */ proc varmax data=a; model y1 y2 / p=1 dify=(4) lagmax=5; run; /* Fit VAR(1) in both regular and seasonal differencing */ proc varmax data=a; model y1 y2 / p=1 dify=(1,4) lagmax=5; run; /* Fit VAR(1) in different differencing */ proc varmax data=a; model y1 y2 / p=1 dif=(y1(1,4) y2(1)) lagmax=5; run; /* Options related to prediction */ proc varmax data=a; model y1 y2 / p=1 lagmax=3
Example 32.3: Numerous Examples F 2219
print=(impulse covpe(5) decompose(5)); run; /* Options related to tentative order selection */ proc varmax data=a; model y1 y2 / p=1 lagmax=5 minic print=(parcoef pcancorr pcorr); run; /* Automatic selection of the AR order */ proc varmax data=a; model y1 y2 / minic=(type=aic p=5); run; /* Compare results of LS and Yule-Walker Estimators */ proc varmax data=a; model y1 y2 / p=1 print=(yw); run; /* BVAR(1) of the nonstationary series y1 and y2 */ proc varmax data=a; model y1 y2 / p=1 prior=(lambda=1 theta=0.2 ivar); run; /* BVAR(1) of the nonstationary series y1 */ proc varmax data=a; model y1 y2 / p=1 prior=(lambda=0.1 theta=0.15 ivar=(y1)); run; /* Data 'b' Generated Process */ proc iml; sig = { 0.5 0.14 -0.08 -0.03, 0.14 0.71 0.16 0.1, -0.08 0.16 0.65 0.23, -0.03 0.1 0.23 0.16}; sig = sig * 0.0001; phi = {1.2 -0.5 0. 0.1, 0.6 0.3 -0.2 0.5, 0.4 0. -0.2 0.1, -1.0 0.2 0.7 -0.2}; call varmasim(y,phi) sigma = sig n = 100 seed = 32567; cn = {'y1' 'y2' 'y3' 'y4'}; create b from y[colname=cn]; append from y; quit; /* Cointegration Rank Test using Trace statistics */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest; run; /* Cointegration Rank Test using Max statistics */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest=(johansen=(type=max)); run;
2220 F Chapter 32: The VARMAX Procedure
/* Common Trends Test using Filter(Differencing) statistics */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest=(sw); run; /* Common Trends Test using Filter(Residual) statistics */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest=(sw=(type=filtres lag=1)); run; /* Common Trends Test using Kernel statistics */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest=(sw=(type=kernel lag=1)); run; /* Cointegration Rank Test for I(2) */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 cointtest=(johansen=(iorder=2)); run; /* Fit VECM(2) with rank=3 */ proc varmax data=b; model y1-y4 / p=2 lagmax=4 print=(roots iarr) ecm=(rank=3 normalize=y1); run; /* Weak Exogenous Testing for each variable */ proc varmax data=b outstat=bbb; model y1-y4 / p=2 lagmax=4 ecm=(rank=3 normalize=y1); cointeg rank=3 exogeneity; run; /* Hypotheses Testing for long-run and adjustment parameter */ proc varmax data=b outstat=bbb; model y1-y4 / p=2 lagmax=4 ecm=(rank=3 normalize=y1); cointeg rank=3 normalize=y1 h=(1 0 0, 0 1 0, -1 0 0, 0 0 1) j=(1 0 0, 0 1 0, 0 0 1, 0 0 0); run; /* ordinary regression model */ proc varmax data=grunfeld; model y1 y2 = x1-x3; run; /* Ordinary regression model with subset lagged terms */ proc varmax data=grunfeld; model y1 y2 = x1 / xlag=(1,3); run; /* VARX(1,1) with no current time Exogenous Variables */ proc varmax data=grunfeld;
Example 32.3: Numerous Examples F 2221
model y1 y2 = x1 / p=1 xlag=1 nocurrentx; run; /* VARX(1,1) with different Exogenous Variables */ proc varmax data=grunfeld; model y1 = x3, y2 = x1 x2 / p=1 xlag=1; run; /* VARX(1,2) in difference with current Exogenous Variables */ proc varmax data=grunfeld; model y1 y2 = x1 / p=1 xlag=2 difx=(1) dify=(1); run;
2222 F Chapter 32: The VARMAX Procedure
Example 32.4: Illustration of ODS Graphics This example illustrates the use of ODS Graphics. The graphical displays are requested by specifying the ODS GRAPHICS ON statement. For information about the graphics available in the VARMAX procedure, see the section “ODS Graphics” on page 2191. The following statements use the SASHELP.WORKERS data set to study the time series of electrical workers and its interaction with the series of masonry workers. The series and predict plots, the residual plot, and the forecast plot are created in Output 32.4.1 through Output 32.4.3. These are a selection of the plots created by the VARMAX procedure. title "Illustration of ODS Graphics"; proc varmax data=sashelp.workers plot(unpack)=(residual model forecasts); id date interval=month; model electric masonry / dify=(1,12) noint p=1; output lead=12; run;
Output 32.4.1 Series and Predicted Series Plots
Example 32.4: Illustration of ODS Graphics F 2223
Output 32.4.2 Residual Plot
2224 F Chapter 32: The VARMAX Procedure
Output 32.4.3 Series and Forecast Plots
References F 2225
References Anderson, T. W. (1951), “Estimating Linear Restrictions on Regression Coefficients for Multivariate Normal Distributions,” Annals of Mathematical Statistics, 22, 327-351. Ansley, C. F. and Newbold, P. (1979), “Multivariate Partial Autocorrelations,” ASA Proceedings of the Business and Economic Statistics Section, 349–353. Bollerslev, T. (1990), “Modeling the Coherence in Short-Run Nominal Exchange Rates: A Multivariate Generalized ARCH Model,” Review of Econometrics and Stochastics, 72, 498–505. Engle, R. F. and Granger, C. W. J. (1987), “Co-integration and Error Correction: Representation, Estimation and Testing,” Econometrica, 55, 251–276. Engle, R. F. and Kroner, K. F. (1995), “Multivariate Simultaneous Generalized ARCH,” Econometric Theory, 11, 122–150. Golub, G. H. and Van Loan, C. F. (1983), Matrix Computations, Baltimore and London: Johns Hopkins University Press. Goodnight, J. H. (1979), “A Tutorial on the SWEEP Operator,” The American Statistician, 33, 149–158. Hosking, J. R. M. (1980), “The Multivariate Portmanteau Statistic,” Journal of the American Statistical Association, 75, 602–608. Johansen, S. (1988), “Statistical Analysis of Cointegration Vectors,” Journal of Economic Dynamics and Control, 12, 231–254. Johansen, S. (1995a), “A Statistical Analysis of Cointegration for I(2) Variables,” Econometric Theory, 11, 25–59. Johansen, S. (1995b), Likelihood-Based Inference in Cointegrated Vector Autoregressive Models, New York: Oxford University Press. Johansen, S. and Juselius, K. (1990), “Maximum Likelihood Estimation and Inference on Cointegration: With Applications to the Demand for Money,” Oxford Bulletin of Economics and Statistics, 52, 169–210. Koreisha, S. and Pukkila, T. (1989), “Fast Linear Estimation Methods for Vector Autoregressive Moving Average Models,” Journal of Time Series Analysis, 10, 325-339. Litterman, R. B. (1986), “Forecasting with Bayesian Vector Autoregressions: Five Years of Experience,” Journal of Business & Economic Statistics, 4, 25–38. Lütkepohl, H. (1993), Introduction to Multiple Time Series Analysis, Berlin: Springer-Verlag. Osterwald-Lenum, M. (1992), “A Note with Quantiles of the Asymptotic Distribution of the Maximum Likelihood Cointegration Rank Test Statistics,” Oxford Bulletin of Economics and Statistics, 54, 461–472.
2226 F Chapter 32: The VARMAX Procedure
Pringle, R. M. and Rayner, D. L. (1971), Generalized Inverse Matrices with Applications to Statistics, Second Edition, New York: McGraw-Hill Inc. Quinn, B. G. (1980), “Order Determination for a Multivariate Autoregression,” Journal of the Royal Statistical Society, B, 42, 182–185. Reinsel, G. C. (1997), Elements of Multivariate Time Series Analysis, Second Edition, New York: Springer-Verlag. Spliid, H. (1983), “A Fast Estimation for the Vector Autoregressive Moving Average Models with Exogenous Variables,” Journal of the American Statistical Association, 78, 843–849. Stock, J. H. and Watson, M. W. (1988), “Testing for Common Trends,” Journal of the American Statistical Association, 83, 1097–1107.
Chapter 33
The X11 Procedure Contents Overview: X11 Procedure . . . . . . . . . . . . . . . . . . . . Getting Started: X11 Procedure . . . . . . . . . . . . . . . . . Basic Seasonal Adjustment . . . . . . . . . . . . . . . . X-11-ARIMA . . . . . . . . . . . . . . . . . . . . . . . Syntax: X11 Procedure . . . . . . . . . . . . . . . . . . . . . . Functional Summary . . . . . . . . . . . . . . . . . . . . PROC X11 Statement . . . . . . . . . . . . . . . . . . . ARIMA Statement . . . . . . . . . . . . . . . . . . . . . BY Statement . . . . . . . . . . . . . . . . . . . . . . . ID Statement . . . . . . . . . . . . . . . . . . . . . . . . MACURVES Statement . . . . . . . . . . . . . . . . . . MONTHLY Statement . . . . . . . . . . . . . . . . . . . OUTPUT Statement . . . . . . . . . . . . . . . . . . . . PDWEIGHTS Statement . . . . . . . . . . . . . . . . . . QUARTERLY Statement . . . . . . . . . . . . . . . . . SSPAN Statement . . . . . . . . . . . . . . . . . . . . . TABLES Statement . . . . . . . . . . . . . . . . . . . . VAR Statement . . . . . . . . . . . . . . . . . . . . . . . Details: X11 Procedure . . . . . . . . . . . . . . . . . . . . . . Historical Development of X-11 . . . . . . . . . . . . . . Implementation of the X-11 Seasonal Adjustment Method Computational Details for Sliding Spans Analysis . . . . Data Requirements . . . . . . . . . . . . . . . . . . . . . Missing Values . . . . . . . . . . . . . . . . . . . . . . . Prior Daily Weights and Trading-Day Regression . . . . . Adjustment for Prior Factors . . . . . . . . . . . . . . . . The YRAHEADOUT Option . . . . . . . . . . . . . . . Effect of Backcast and Forecast Length . . . . . . . . . . Details of Model Selection . . . . . . . . . . . . . . . . . OUT= Data Set . . . . . . . . . . . . . . . . . . . . . . . The OUTSPAN= Data Set . . . . . . . . . . . . . . . . . OUTSTB= Data Set . . . . . . . . . . . . . . . . . . . . OUTTDR= Data Set . . . . . . . . . . . . . . . . . . . . Printed Output . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
. .
2228 2228 2229 2232 2234 2234 2236 2237 2240 2240 2240 2241 2245 2246 2246 2249 2250 2250 2250 2250 2252 2256 2258 2259 2259 2260 2261 2261 2262 2265 2265 2265 2266 2268
2228 F Chapter 33: The X11 Procedure
ODS Table Names . . . . . . . . . . . . . . . . . . . . . Examples: X11 Procedure . . . . . . . . . . . . . . . . . . . . Example 33.1: Component Estimation—Monthly Data . Example 33.2: Components Estimation—Quarterly Data Example 33.3: Outlier Detection and Removal . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. 2279 . 2283 . 2283 . . 2287 . 2289 . . 2291
Overview: X11 Procedure The X11 procedure, an adaptation of the U.S. Bureau of the Census X-11 Seasonal Adjustment program, seasonally adjusts monthly or quarterly time series. The procedure makes additive or multiplicative adjustments and creates an output data set containing the adjusted time series and intermediate calculations. The X11 procedure also provides the X-11-ARIMA method developed by Statistics Canada. This method fits an ARIMA model to the original series, then uses the model forecast to extend the original series. This extended series is then seasonally adjusted by the standard X-11 seasonal adjustment method. The extension of the series improves the estimation of the seasonal factors and reduces revisions to the seasonally adjusted series as new data become available. The X11 procedure incorporates sliding spans analysis. This type of analysis provides a diagnostic for determining the suitability of seasonal adjustment for an economic series. Seasonal adjustment of a series is based on the assumption that seasonal fluctuations can be measured in the original series, Ot , t D 1; : : : ; n, and separated from trend cycle, trading-day, and irregular fluctuations. The seasonal component of this time series, St , is defined as the intrayear variation that is repeated constantly or in an evolving fashion from year to year. The trend cycle component, Ct , includes variation due to the long-term trend, the business cycle, and other long-term cyclical factors. The trading-day component, Dt , is the variation that can be attributed to the composition of the calendar. The irregular component, It , is the residual variation. Many economic time series are related in a multiplicative fashion (Ot D St Ct Dt It ). A seasonally adjusted time series, Ct It , consists of only the trend cycle and irregular components.
Getting Started: X11 Procedure The most common use of the X11 procedure is to produce a seasonally adjusted series. Eliminating the seasonal component from an economic series facilitates comparison among consecutive months or quarters. A plot of the seasonally adjusted series is often more informative about trends or location in a business cycle than a plot of the unadjusted series. The following example shows how to use PROC X11 to produce a seasonally adjusted series, Ct It , from an original series Ot D St Ct Dt It .
Basic Seasonal Adjustment F 2229
In the multiplicative model, the trend cycle component Ct keeps the same scale as the original series Ot , while St , Dt , and It vary around 1.0. In all printed tables and in the output data set, these latter components are expressed as percentages, and thus will vary around 100.0 (in the additive case, they vary around 0.0). The naming convention used in PROC X11 for the tables follows the original U.S. Bureau of the Census X-11 Seasonal Adjustment program specification (Shiskin, Young, and Musgrave 1967). Also, see the section “Printed Output” on page 2268. This convention is outlined in Figure 33.1. The tables corresponding to parts A – C are intermediate calculations. The final estimates of the individual components are found in the D tables: table D10 contains the final seasonal factors, table D12 contains the final trend cycle, and table D13 contains the final irregular series. If you are primarily interested in seasonally adjusting a series without consideration of intermediate calculations or diagnostics, you only need to look at table D11, the final seasonally adjusted series. For further details about the X-11-ARIMA tables, see Ladiray and Quenneville (2001).
Basic Seasonal Adjustment Suppose you have monthly retail sales data starting in September 1978 in a SAS data set named SALES. At this point you do not suspect that any calendar effects are present, and there are no prior adjustments that need to be made to the data. In this simplest case, you need only specify the DATE= variable in the MONTHLY statement, which associates a SAS date value to each observation. To see the results of the seasonal adjustment, you must request table D11, the final seasonally adjusted series, in a TABLES statement. data sales; input sales @@; date = intnx( 'month', '01sep1978'd, _n_-1 ); format date monyy7.; datalines; ... more lines ...
/*--- X-11 ARIMA ---*/ proc x11 data=sales; monthly date=date; var sales; tables d11; run;
2230 F Chapter 33: The X11 Procedure
Figure 33.1 Basic Seasonal Adjustment The X11 Procedure Seasonal Adjustment of - sales X-11 Seasonal Adjustment Program U. S. Bureau of the Census Economic Research and Analysis Division November 1, 1968 The X-11 program is divided into seven major parts. Part Description A. Prior adjustments, if any B. Preliminary estimates of irregular component weights and regression trading day factors C. Final estimates of above D. Final estimates of seasonal, trend-cycle and irregular components E. Analytical tables F. Summary measures G. Charts Series - sales Period covered - 9/1978 to 8/1990 Type of run: multiplicative seasonal adjustment. Selected Tables or Charts. Sigma limits for graduating extreme values are 1.5 and 2.5 Irregular values outside of 2.5-sigma limits are excluded from trading day regression
Basic Seasonal Adjustment F 2231
Figure 33.2 Basic Seasonal Adjustment
Year
JAN
D11 Final Seasonally Adjusted Series FEB MAR APR
MAY
JUN
1978 . . . . . . 1979 124.935 126.533 125.282 125.650 127.754 129.648 1980 128.734 139.542 143.726 143.854 148.723 144.530 1981 176.329 166.264 167.433 167.509 173.573 175.541 1982 186.747 202.467 192.024 202.761 197.548 206.344 1983 233.109 223.345 218.179 226.389 224.249 227.700 1984 238.261 239.698 246.958 242.349 244.665 247.005 1985 275.766 282.316 294.169 285.034 294.034 296.114 1986 325.471 332.228 330.401 330.282 333.792 331.349 1987 363.592 373.118 368.670 377.650 380.316 376.297 1988 370.966 384.743 386.833 405.209 380.840 389.132 1989 428.276 418.236 429.409 446.467 437.639 440.832 1990 480.631 474.669 486.137 483.140 481.111 499.169 --------------------------------------------------------------------Avg 277.735 280.263 282.435 286.358 285.354 288.638
Year
JUL
D11 Final Seasonally Adjusted Series AUG SEP OCT NOV
DEC
Total
1978 . . 123.507 125.776 124.735 129.870 503.887 1979 127.880 129.285 126.562 134.905 133.356 136.117 1547.91 1980 140.120 153.475 159.281 162.128 168.848 165.159 1798.12 1981 179.301 182.254 187.448 197.431 184.341 184.304 2141.73 1982 211.690 213.691 214.204 218.060 228.035 240.347 2513.92 1983 222.045 222.127 222.835 212.227 230.187 232.827 2695.22 1984 251.247 253.805 264.924 266.004 265.366 277.025 3037.31 1985 294.196 309.162 311.539 319.518 318.564 323.921 3604.33 1986 337.095 341.127 346.173 350.183 360.792 362.333 4081.23 1987 379.668 375.607 374.257 372.672 368.135 364.150 4474.13 1988 385.479 377.147 397.404 403.156 413.843 416.142 4710.89 1989 450.103 454.176 460.601 462.029 427.499 485.113 5340.38 1990 485.370 485.103 . . . . 3875.33 -------------------------------------------------------------------------Avg 288.683 291.413 265.728 268.674 268.642 276.442 Total:
40324
Mean:
280.03
S.D.:
111.31
You can compare the original series, table B1, and the final seasonally adjusted series, table D11, by plotting them together. These tables are requested and named in the OUTPUT statement. title 'Monthly Retail Sales Data (in $1000)'; proc x11 data=sales noprint; monthly date=date; var sales; output out=out b1=sales d11=adjusted; run; proc sgplot data=out; series x=date y=sales
/ markers
2232 F Chapter 33: The X11 Procedure
markerattrs=(color=red symbol='asterisk') lineattrs=(color=red) legendlabel="original" ; series x=date y=adjusted / markers markerattrs=(color=blue symbol='circle') lineattrs=(color=blue) legendlabel="adjusted" ; yaxis label='Original and Seasonally Adjusted Time Series'; run;
Figure 33.3 Plot of Original and Seasonally Adjusted Data
X-11-ARIMA An inherent problem with the X-11 method is the revision of the seasonal factor estimates as new data become available. The X-11 method uses a set of centered moving averages to estimate the seasonal components. These moving averages apply symmetric weights to all observations except those at the beginning and end of the series, where asymmetric weights have to be applied. These asymmetric weights can cause poor estimates of the seasonal factors, which then can cause large revisions when new data become available.
X-11-ARIMA F 2233
While large revisions to seasonally adjusted values are not common, they can happen. When they do happen, it undermines the credibility of the X-11 seasonal adjustment method. A method to address this problem was developed at Statistics Canada (Dagum 1980, 1982a). This method, known as X-11-ARIMA, applies an ARIMA model to the original data (after adjustments, if any) to forecast the series one or more years. This extended series is then seasonally adjusted, allowing symmetric weights to be applied to the end of the original data. This method was tested against a large number of Canadian economic series and was found to greatly reduce the amount of revisions as new data were added. The X-11-ARIMA method is available in PROC X11 through the use of the ARIMA statement. The ARIMA statement extends the original series either with a user-specified ARIMA model or by an automatic selection process in which the best model from a set of five predefined ARIMA models is used. The following example illustrates the use of the ARIMA statement. The ARIMA statement does not contain a user-specified model, so the best model is chosen by the automatic selection process. Forecasts from this best model are then used to extend the original series by one year. The following partial listing shows parameter estimates and model diagnostics for the ARIMA model chosen by the automatic selection process. proc x11 data=sales; monthly date=date; var sales; arima; run;
Figure 33.4 X-11-ARIMA Model Selection Monthly Retail Sales Data (in $1000) The X11 Procedure Seasonal Adjustment of - sales Conditional Least Squares Estimation Approx. Parameter Estimate Std Error t Value MU MA1,1 MA1,2 MA2,1
0.0001728 0.3739984 0.0231478 0.5727914
0.0009596 0.0893427 0.0892154 0.0790835
0.18 4.19 0.26 7.24
Conditional Least Squares Estimation Variance Estimate = Std Error Estimate = AIC = SBC = Number of Residuals=
0.0014313 0.0378326 -482.2412 -470.7404 131
* Does not include log determinant
* *
Lag 0 1 2 12
2234 F Chapter 33: The X11 Procedure
Figure 33.4 continued Criteria Summary for Model 2: (0,1,2)(0,1,1)s, Log Transform Box-Ljung Chi-square: 22.03 with 21 df Prob= 0.40 (Criteria prob > 0.05) Test for over-differencing: sum of MA parameters = 0.57 (must be < 0.90) MAPE - Last Three Years: 2.84 (Must be < 15.00 %) - Last Year: 3.04 - Next to Last Year: 1.96 - Third from Last Year: 3.51
Table D11 (final seasonally adjusted series) is now constructed using symmetric weights on observations at the end of the actual data. This should result in better estimates of the seasonal factors and, thus, smaller revisions in Table D11 as more data become available.
Syntax: X11 Procedure The X11 procedure uses the following statements: PROC X11 options ; ARIMA options ; BY variables ; ID variables ; MACURVES option ; MONTHLY options ; OUTPUT OUT=dataset options ; PDWEIGHTS option ; QUARTERLY options ; SSPAN options ; TABLES tablenames ; VAR variables ;
Either the MONTHLY or QUARTERLY statement must be specified, depending on the type of time series data you have. The PDWEIGHTS and MACURVES statements can be used only with the MONTHLY statement. The TABLES statement controls the printing of tables, while the OUTPUT statement controls the creation of the OUT= data set.
Functional Summary The statements and options controlling the X11 procedures are summarized in the following table.
Functional Summary F 2235
Description Data Set Options specify input data set write the trading-day regression results to an output data set write the stable seasonality test results to an output data set write table values to an output data set add extrapolated values to the output data set add year ahead estimates to the output data set write the sliding spans analysis results to an output data set Printing Control Options suppress all printed output suppress all printed ARIMA output print all ARIMA output print selected tables and charts print selected groups of tables print selected groups of charts print preliminary tables associated with ARIMA processing specify number of decimals for printed tables suppress all printed SSPAN output print all SSPAN output
Statement
Option
PROC X11 PROC X11
DATA= OUTTDR=
PROC X11
OUTSTB=
OUTPUT PROC X11 PROC X11 PROC X11
OUT= OUTEX YRAHEADOUT OUTSPAN=
PROC X11 ARIMA ARIMA TABLES MONTHLY QUARTERLY MONTHLY QUARTERLY ARIMA
NOPRINT NOPRINT PRINTALL PRINTOUT= PRINTOUT= CHARTS= CHARTS= PRINTFP
MONTHLY QUARTERLY SSPAN SSPAN
NDEC= NDEC= NOPRINT PRINTALL
Date Information Options specify a SAS date variable
MONTHLY QUARTERLY specify the beginning date MONTHLY QUARTERLY specify the ending date MONTHLY QUARTERLY specify beginning year for trading-day regres- MONTHLY sion Declaring the Role of Variables specify BY-group processing specify the variables to be seasonally adjusted specify identifying variables specify the prior monthly factor
BY VAR ID MONTHLY
DATE= DATE= START= START= END= END= TDCOMPUTE=
PMFACTOR=
2236 F Chapter 33: The X11 Procedure
Description Controlling the Table Computations use additive adjustment specify seasonal factor moving average length specify the extreme value limit for trading-day regression specify the lower bound for extreme irregulars
Statement
Option
MONTHLY QUARTERLY MACURVES MONTHLY
ADDITIVE ADDITIVE
MONTHLY QUARTERLY specify the upper bound for extreme irregulars MONTHLY QUARTERLY include the length-of-month in trading-day re- MONTHLY gression specify trading-day regression action MONTHLY compute summary measure only MONTHLY QUARTERLY modify extreme irregulars prior to trend MONTHLY cycle estimation QUARTERLY specify moving average length in trend MONTHLY cycle estimation QUARTERLY specify weights for prior trading-day factors PDWEIGHTS
EXCLUDE= FULLWEIGHT= FULLWEIGHT= ZEROWEIGHT= ZEROWEIGHT= LENGTH TDREGR= SUMMARY SUMMARY TRENDADJ TRENDADJ TRENDMA= TRENDMA=
PROC X11 Statement PROC X11 options ;
The following options can appear in the PROC X11 statement: DATA= SAS-data-set
specifies the input SAS data set used. If it is omitted, the most recently created SAS data set is used. OUTEXTRAP
adds the extra observations used in ARIMA processing to the output data set. When ARIMA forecasting/backcasting is requested, extra observations are appended to the ends of the series, and the calculations are carried out on this extended series. The appended observations are not normally written to the OUT= data set. However, if OUTEXTRAP is specified, these extra observations are written to the output data set. If a DATE= variable is specified in the MONTHLY/QUARTERLY statement, the date variable is extrapolated to identify forecasts/backcasts. The OUTEXTRAP option can be abbreviated as OUTEX. NOPRINT
suppresses any printed output. The NOPRINT option overrides any PRINTOUT=, CHARTS=, or TABLES statement and any output associated with the ARIMA statement.
ARIMA Statement F 2237
OUTSPAN= SAS-data-set
specifies the output data set to store the sliding spans analysis results. Tables A1, C18, D10, and D11 for each span are written to this data set. See the section “The OUTSPAN= Data Set” on page 2265 for details. OUTSTB= SAS-data-set
specifies the output data set to store the stable seasonality test results (table D8). All the information in the analysis of variance table associated with the stable seasonality test is contained in the variables written to this data set. See the section “OUTSTB= Data Set” on page 2265 for details. OUTTDR= SAS-data-set
specifies the output data set to store the trading-day regression results (tables B15 and C15). All the information in the analysis of variance table associated with the trading-day regression is contained in the variables written to this data set. This option is valid only when TDREGR=PRINT, TEST, or ADJUST is specified in the MONTHLY statement. See the section “OUTTDR= Data Set” on page 2266 for details. YRAHEADOUT
adds one-year-ahead forecast values to the output data set for tables C16, C18, and D10. The original purpose of this option was to avoid recomputation of the seasonal adjustment factors when new data became available. While computing costs were an important factor when the X-11 method was developed, this is no longer the case and this option is obsolete. See the section “The YRAHEADOUT Option” on page 2261 for details.
ARIMA Statement ARIMA options ;
The ARIMA statement applies the X-11-ARIMA method to the series specified in the VAR statement. This method uses an ARIMA model estimated from the original data to extend the series one or more years. The ARIMA statement options control the ARIMA model used and the estimation, forecasting, and printing of this model. There are two ways of obtaining an ARIMA model to extend the series. A model can be given explicitly with the MODEL= and TRANSFORM= options. Alternatively, the best-fitting model from a set of five predefined models is found automatically whenever the MODEL= option is absent. See the section “Details of Model Selection” on page 2262 for details. BACKCAST= n
specifies the number of years to backcast the series. The default is BACKCAST= 0. See the section “Effect of Backcast and Forecast Length” on page 2261 for details. CHICR= value
specifies the criteria for the significance level for the Box-Ljung chi-square test for lack of fit when testing the five predefined models. The default is CHICR= 0.05. The CHICR= option values must be between 0.01 and 0.90. The hypothesis being tested is that of model adequacy.
2238 F Chapter 33: The X11 Procedure
Nonrejection of the hypothesis is evidence for an adequate model. Making the CHICR= value smaller makes it easier to accept the model. See the section “Criteria Details” on page 2263 for further details on the CHICR= option. CONVERGE= value
specifies the convergence criterion for the estimation of an ARIMA model. The default value is 0.001. The CONVERGE= value must be positive. FORECAST= n
specifies the number of years to forecast the series. The default is FORECAST= 1. See the section “Effect of Backcast and Forecast Length” on page 2261 for details. MAPECR= value
specifies the criteria for the mean absolute percent error (MAPE) when testing the five predefined models. A small MAPE value is evidence for an adequate model; a large MAPE value results in the model being rejected. The MAPECR= value is the boundary for acceptance/rejection. Thus a larger MAPECR= value would make it easier for a model to pass the criteria. The default is MAPECR= 15. The MAPECR= option values must be between 1 and 100. See the section “Criteria Details” on page 2263 for further details on the MAPECR= option. MAXITER= n
specifies the maximum number of iterations in the estimation process. MAXITER must be between 1 and 60; the default value is 15. METHOD= CLS METHOD= ULS METHOD= ML
specifies the estimation method. ML requests maximum likelihood, ULS requests unconditional least squares, and CLS requests conditional least squares. METHOD=CLS is the default. The maximum likelihood estimates are more expensive to compute than the conditional least squares estimates. In some cases, however, they can be preferable. For further information on the estimation methods, see “Estimation Details” on page 252 in Chapter 7, “The ARIMA Procedure.” MODEL= ( P=n1 Q=n2 SP=n3 SQ=n4 DIF=n5 SDIF=n6 < NOINT > < CENTER >)
specifies the ARIMA model. The AR and MA orders are given by P=n1 and Q=n2, respectively, while the seasonal AR and MA orders are given by SP=n3 and SQ=n4, respectively. The lag corresponding to seasonality is determined by the MONTHLY or QUARTERLY statement. Similarly, differencing and seasonal differencing are given by DIF=n5 and SDIF=n6, respectively. For example arima model=( p=2 q=1 sp=1 dif=1 sdif=1 );
specifies a (2,1,1)(1,1,0)s model, where s, the seasonality, is either 12 (monthly) or 4 (quarterly). More examples of the MODEL= syntax are given in the section “Details of Model Selection” on page 2262.
ARIMA Statement F 2239
NOINT
suppresses the fitting of a constant (or intercept) parameter in the model. (That is, the parameter is omitted.) CENTER
centers each time series by subtracting its sample mean. The analysis is done on the centered data. Later, when forecasts are generated, the mean is added back. Note that centering is done after differencing. The CENTER option is normally used in conjunction with the NOCONSTANT option of the ESTIMATE statement. For example, to fit an AR(1) model on the centered data without an intercept, use the following ARIMA statement: arima model=( p=1 center noint );
NOPRINT
suppresses the normal printout generated by the ARIMA statement. Note that the effect of specifying the NOPRINT option in the ARIMA statement is different from the effect of specifying the NOPRINT in the PROC X11 statement, since the former only affects ARIMA output. OVDIFCR= value
specifies the criteria for the over-differencing test when testing the five predefined models. When the MA parameters in one of these models sum to a number close to 1.0, this is an indication of over-parameterization and the model is rejected. The OVDIFCR= value is the boundary for this rejection; values greater than this value fail the over-differencing test. A larger OVDIFCR= value would make it easier for a model to pass the criteria. The default is OVDIFCR= 0.90. The OVDIFCR= option values must be between 0.80 and 0.99. See the section “Criteria Details” on page 2263 for further details on the OVDIFCR= option. PRINTALL
provides the same output as the default printing for all models fit and, in addition, prints an estimation summary and chi-square statistics for each model fit. See “Printed Output” on page 2268 for details. PRINTFP
prints the results for the initial pass of X11 made to exclude trading-day effects. This option has an effect only when the TDREGR= option specifies ADJUST, TEST, or PRINT. In these cases, an initial pass of the standard X11 method is required to get rid of calendar effects before doing any ARIMA estimation. Usually this first pass is not of interest, and by default no tables are printed. However, specifying PRINTFP in the ARIMA statement causes any tables printed in the final pass to also be printed for this initial pass. TRANSFORM= (LOG) | LOG TRANSFORM= ( constant ** power )
The ARIMA statement in PROC X11 allows certain transformations on the series before estimation. The specified transformation is applied only to a user-specified model. If TRANSFORM= is specified and the MODEL= option is not specified, the transformation request is ignored and a warning is printed.
2240 F Chapter 33: The X11 Procedure
The LOG transformation requests that the natural log of the series be used for estimation. The resulting forecast values are transformed back to the original scale. A general power transformation of the form Xt ! .Xt C a/b is obtained by specifying transform= ( a ** b )
If the constant a is not specified, it is assumed to be zero. The specified ARIMA model is then estimated using the transformed series. The resulting forecast values are transformed back to the original scale.
BY Statement BY variables ;
A BY statement can be used with PROC X11 to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input DATA= data set to be sorted in order of the BY variables.
ID Statement ID variables ;
If you are creating an output data set, use the ID statement to put values of the ID variables, in addition to the table values, into the output data set. The ID statement has no effect when an output data set is not created. If the DATE= variable is specified in the MONTHLY or QUARTERLY statement, this variable is included automatically in the OUTPUT data set. If no DATE= variable is specified, the variable _DATE_ is added. The date variable (or _DATE_) values outside the range of the actual data (from ARIMA forecasting or backcasting, or from YRAHEADOUT) are extrapolated, while all other ID variables are missing.
MACURVES Statement MACURVES month=option . . . ;
The MACURVES statement specifies the length of the moving-average curves for estimating the seasonal factors for any month. This statement can be used only with monthly time series data. The month=option specifications consist of the month name (or the first three letters of the month name), an equal sign, and one of the following option values: ’3’
specifies a three-term moving average for the month
MONTHLY Statement F 2241
’3X3’
specifies a three-by-three moving average
’3X5’
specifies a three-by-five moving average
’3X9’
specifies a three-by-nine moving average
STABLE
specifies a stable seasonal factor (average of all values for the month)
For example, the statement macurves jan='3' feb='3x3' march='3x5' april='3x9';
uses a three-term moving average to estimate seasonal factors for January, a 3 3 (a three-term moving average of a three-term moving average) for February, a 3 5 (a three-term moving average of a five-term moving average) for March, and a 3 9 (a three-term moving average of a nine-term moving average) for April. The numeric values used for the weights of the various moving averages and a discussion of the derivation of these weights are given in Shiskin, Young, and Musgrave (1967). A general discussion of moving average weights is given in Dagum (1985). If the specification for a month is omitted, the X11 procedure uses a three-by-three moving average for the first estimate of each iteration and a three-by-five average for the second estimate.
MONTHLY Statement MONTHLY options ;
The MONTHLY statement must be used when the input data to PROC X11 are a monthly time series. The MONTHLY statement specifies options that determine the computations performed by PROC X11 and what is included in its output. Either the DATE= or START= option must be used. The following options can appear in the MONTHLY statement. ADDITIVE
performs additive adjustments. If the ADDITIVE option is omitted, PROC X11 performs multiplicative adjustments. CHARTS= STANDARD CHARTS= FULL CHARTS= NONE
specifies the charts produced by the procedure. The default is CHARTS=STANDARD, which specifies 12 monthly seasonal charts and a trend cycle chart. If you specify CHARTS=FULL (or CHARTS=ALL), the procedure prints additional charts of irregular and seasonal factors. To print no charts, specify CHARTS=NONE. The TABLES statement can also be used to specify particular monthly charts to be printed. If no CHARTS= option is given, and a TABLES statement is given, the TABLES statement
2242 F Chapter 33: The X11 Procedure
overrides the default value of CHARTS=STANDARD; that is, no charts (or tables) are printed except those specified in the TABLES statement. However, if both the CHARTS= option and a TABLES statement are given, the charts corresponding to the CHARTS= option and those requested by the TABLES statement are printed. For example, suppose you wanted only charts G1, the final seasonally adjusted series and trend cycle, and G4, the final irregular and final modified irregular series. You would specify the following statements: monthly date=date; tables g1 g4;
DATE= variable
specifies a variable that gives the date for each observation. The starting and ending dates are obtained from the first and last values of the DATE= variable, which must contain SAS date values. The procedure checks values of the DATE= variable to ensure that the input observations are sequenced correctly. This variable is automatically added to the OUTPUT= data set if one is requested and extrapolated if necessary. If the DATE= option is not specified, the START= option must be specified. The DATE= option and the START= and END= options can be used in combination to subset a series for processing. For example, suppose you have 12 years of monthly data (144 observations, no missing values) beginning in January 1970 and ending in December 1981, and you wanted to seasonally adjust only six years beginning in January 1974. Specifying monthly date=date start=jan1974 end=dec1979;
would seasonally adjust only this subset of the data. If instead you wanted to adjust the last eight years of data, only the START= option is needed: monthly date=date start=jan1974;
END= mmmyyyy
specifies that only the part of the input series ending with the month and year given be adjusted (for example, END=DEC1970). See the DATE=variable option for using the START= and END= options to subset a series for processing. EXCLUDE= value
excludes from the trading-day regression any irregular values that are more than value standard deviations from the mean. The EXCLUDE=value must be between 0.1 and 9.9, with the default value being 2.5. FULLWEIGHT= value
assigns weights to irregular values based on their distance from the mean in standard deviation units. The weights are used for estimating seasonal and trend cycle components. Irregular values less than the FULLWEIGHT= value (in standard deviation units) are assigned full weights of 1, values that fall between the ZEROWEIGHT= and FULLWEIGHT= limits are assigned
MONTHLY Statement F 2243
weights linearly graduated between 0 and 1, and values greater than the ZEROWEIGHT= limit are assigned a weight of 0. For example, if ZEROWEIGHT=2 and FULLWEIGHT=1, a value 1.3 standard deviations from the mean would be assigned a graduated weight. The FULLWEIGHT=value must be between 0.1 and 9.9 but must be less than the ZEROWEIGHT=value. The default is FULLWEIGHT=1.5. LENGTH
includes length-of-month allowance in computing trading-day factors. If this option is omitted, length-of-month allowances are included with the seasonal factors. NDEC= n
specifies the number of decimal places shown in the printed tables in the listing. This option has no effect on the precision of the variable values in the output data set. PMFACTOR= variable
specifies a variable containing the prior monthly factors. Use this option if you have previous knowledge of monthly adjustment factors. The PMFACTOR= option can be used to make the following adjustments:
adjust the level of all or part of a series with discontinuities
adjust for the influence of holidays that fall on different dates from year to year, such as the effect of Easter on certain retail sales
adjust for unreasonable weather influence on series, such as housing starts
adjust for changing starting dates of fiscal years (for budget series) or model years (for automobiles)
adjust for temporary dislocating events, such as strikes
See the section “Prior Daily Weights and Trading-Day Regression” on page 2259 for details and examples using the PMFACTOR= option. PRINTOUT= STANDARD | LONG | FULL | NONE
specifies the tables to be printed by the procedure. If the PRINTOUT=STANDARD option is specified, between 17 and 27 tables are printed, depending on the other options that are specified. PRINTOUT=LONG prints between 27 and 39 tables, and PRINTOUT=FULL prints between 44 and 59 tables. Specifying PRINTOUT=NONE results in no tables being printed; however, charts are still printed. The default is PRINTOUT=STANDARD. The TABLES statement can also be used to specify particular monthly tables to be printed. If no PRINTOUT= option is specified, and a TABLES statement is given, the TABLES statement overrides the default value of PRINTOUT=STANDARD; that is, no tables (or charts) are printed except those given in the TABLES statement. However, if both the PRINTOUT= option and a TABLES statement are specified, the tables corresponding to the PRINTOUT= option and those requested by the TABLES statement are printed. START= mmmyyyy
adjusts only the part of the input series starting with the specified month and year. When the DATE= option is not used, the START= option gives the year and month of the first input
2244 F Chapter 33: The X11 Procedure
observation — for example, START=JAN1966. START= must be specified if DATE= is not given. If START= is specified (and no DATE= option is given), and an OUT= data set is requested, a variable named _DATE_ is added to the data set, giving the date value for each observation. See the DATE= variable option for using the START= and END= options to subset a series. SUMMARY
specifies that the data are already seasonally adjusted and the procedure is to produce summary measures. If the SUMMARY option is omitted, the X11 procedure performs seasonal adjustment of the input data before calculating summary measures. TDCOMPUTE= year
uses the part of the input series beginning with January of the specified year to derive tradingday weights. If this option is omitted, the entire series is used. TDREGR= NONE | PRINT | ADJUST | TEST
specifies the treatment of trading-day regression. TDREG=NONE omits the computation of the trading-day regression. TDREG=PRINT computes and prints the trading-day regressions but does not adjust the series. TDREG=ADJUST computes and prints the trading-day regression and adjusts the irregular components to obtain preliminary weights. TDREG=TEST adjusts the final series if the trading-day regression estimates explain significant variation on the basis of an F test (or residual trading-day variation if prior weights are used). The default is TDREGR=NONE. See the section “Prior Daily Weights and Trading-Day Regression” on page 2259 for details and examples using the TDREGR= option. If ARIMA processing is requested, any value of TDREGR other than the default TDREGR=NONE will cause PROC X11 to perform an initial pass (see the “Details: X11 Procedure” on page 2250 section and the PRINTFP option). The significance level reported in Table C15 should be viewed with caution. The dependent variable in the trading-day regression is the irregular component formed by an averaging operation. This induces a correlation in the dependent variable and hence in the residuals from which the F test is computed. Hence the distribution of the trading-day regression F statistics differs from an exact F; see Cleveland and Devlin (1980) for details. TRENDADJ
modifies extreme irregular values prior to computing the trend cycle estimates in the first iteration. If the TRENDADJ option is omitted, the trend cycle is computed without modifications for extremes. TRENDMA= 9 | 13 | 23
specifies the number of terms in the moving average to be used by the procedure in estimating the variable trend cycle component. The value of the TRENDMA= option must be 9, 13, or 23. If the TRENDMA= option is omitted, the procedure selects an appropriate moving average. For information about the number of terms in the moving average, see Shiskin, Young, and Musgrave (1967). ZEROWEIGHT= value
assigns weights to irregular values based on their distance from the mean in standard deviation
OUTPUT Statement F 2245
units. The weights are used for estimating seasonal and trend cycle components. Irregular values beyond the standard deviation limit specified in the ZEROWEIGHT= option are assigned zero weights. Values that fall between the two limits (ZEROWEIGHT= and FULLWEIGHT=) are assigned weights linearly graduated between 0 and 1. For example, if ZEROWEIGHT=2 and FULLWEIGHT=1, a value 1.3 standard deviations from the mean would be assigned a graduated weight. The ZEROWEIGHT=value must be between 0.1 and 9.9 but must be greater than the FULLWEIGHT=value. The default is ZEROWEIGHT=2.5. The ZEROWEIGHT option can be used in conjunction with the FULLWEIGHT= option to adjust outliers from a monthly or quarterly series. See Example 33.3 later in this chapter for an illustration of this use.
OUTPUT Statement OUTPUT OUT= SAS-data-set tablename=var1 var2 . . . ;
The OUTPUT statement creates an output data set containing specified tables. The data set is named by the OUT= option. OUT= SAS-data-set
If OUT= is omitted, the SAS System names the new data set by using the DATAn convention. For each table to be included in the output data set, write the X11 table identification keyword, an equal sign, and a list of new variable names: tablename = var1 var2 ... The tablename keywords that can be used in the OUTPUT statement are listed in the section “Printed Output” on page 2268. The following is an example of a VAR statement and an OUTPUT statement: var z1 z2 z3; output out=out_x11
b1=s
d11=w x y;
The variable s contains the table B1 values for the variable z1, while the table D11 values for variables z1, z2, and z3 are contained in variables w, x, and y, respectively. As this example shows, the list of variables following a tablename= keyword can be shorter than the VAR variable list. In addition to the variables named by tablename =var1 var2 . . . , the ID variables, and BY variables, the output data set contains a date identifier variable. If the DATE= option is given in the MONTHLY or QUARTERLY statement, the DATE= variable is the date identifier. If no DATE= option is given, a variable named _DATE_ is the date identifier.
2246 F Chapter 33: The X11 Procedure
PDWEIGHTS Statement PDWEIGHTS day=w . . . ;
The PDWEIGHTS statement can be used to specify one to seven daily weights. The statement can only be used with monthly series that are seasonally adjusted using the multiplicative model. These weights are used to compute prior trading-day factors, which are then used to adjust the original series prior to the seasonal adjustment process. Only relative weights are needed; the X11 procedure adjusts the weights so that they sum to 7.0. The weights can also be corrected by the procedure on the basis of estimates of trading-day variation from the input data. See the section “Prior Daily Weights and Trading-Day Regression” on page 2259 for details and examples using the PDWEIGHTS statement. Each day=w option specifies a weight (w) for the named day. The day can be any day, Sunday through Saturday. The day keyword can be the full spelling of the day, or the three-letter abbreviation. For example, SATURDAY=1.0 and SAT=1.0 are both valid. The weights w must be a numeric value between 0.0 and 10.0. The following is an example of a PDWEIGHTS statement: pdweights sun=.2 mon=.9 tue=1 wed=1 thu=1 fri=.8 sat=.3;
Any number of days can be specified with one PDWEIGHTS statement. The default weight value for any day that is not specified is 0. If you do not use a PDWEIGHTS statement, the program computes daily weights if TDREGR=ADJUST is specified. See Shiskin, Young, and Musgrave (1967) for details.
QUARTERLY Statement QUARTERLY options ;
The QUARTERLY statement must be used when the input data are quarterly time series. This statement includes options that determine the computations performed by the procedure and what is in the printed output. The DATE= option or the START= option must be used. The following options can appear in the QUARTERLY statement. ADDITIVE
performs additive adjustments. If this option is omitted, the procedure performs multiplicative adjustments. CHARTS= STANDARD CHARTS= FULL CHARTS= NONE
specifies the charts to be produced by the procedure.
The default value is
QUARTERLY Statement F 2247
CHARTS=STANDARD, which specifies four quarterly seasonal charts and a trend cycle chart. If you specify CHARTS=FULL (or CHARTS=ALL), the procedure prints additional charts of irregular and seasonal factors. To print no charts, specify CHARTS=NONE. The TABLES statement can also be used to specify particular charts to be printed. The presence of a TABLES statement overrides the default value of CHARTS=STANDARD; that is, if a TABLES statement is specified, and no CHARTS=option is specified, no charts (nor tables) are printed except those given in the TABLES statement. However, if both the CHARTS= option and a TABLES statement are given, the charts corresponding to the CHARTS= option and those requested by the TABLES statement are printed. For example, suppose you wanted only charts G1, the final seasonally adjusted series and trend cycle, and G4, the final irregular and final modified irregular series. This is accomplished by specifying the following statements: quarterly date=date; tables g1 g4;
DATE= variable
specifies a variable that gives the date for each observation. The starting and ending dates are obtained from the first and last values of the DATE= variable, which must contain SAS date values. The procedure checks values of the DATE= variable to ensure that the input observations are sequenced correctly. This variable is automatically added to the OUTPUT= data set if one is requested, and extrapolated if necessary. If the DATE= option is not specified, the START= option must be specified. The DATE= option and the START= and END= options can be used in combination to subset a series for processing. For example, suppose you have a series with 10 years of quarterly data (40 observations, no missing values) beginning in ‘1970Q1’ and ending in ‘1979Q4’, and you want to seasonally adjust only four years beginning in ‘1974Q1’ and ending in ‘1977Q4’. Specifying quarterly date=variable start='1974q1' end='1977q4';
seasonally adjusts only this subset of the data. If instead you wanted to adjust the last six years of data, only the START= option is needed: quarterly date=variable start='1974q1';
END= ‘yyyyQq’
specifies that only the part of the input series ending with the quarter and year given be adjusted (for example, END=’1973Q4’). The specification must be enclosed in quotes and q must be 1, 2, 3, or 4. See the DATE= variable option for using the START= and END= options to subset a series. FULLWEIGHT= value
assigns weights to irregular values based on their distance from the mean in standard deviation units. The weights are used for estimating seasonal and trend cycle components. Irregular values less than the FULLWEIGHT= value (in standard deviation units) are assigned full weights
2248 F Chapter 33: The X11 Procedure
of 1, values that fall between the ZEROWEIGHT= and FULLWEIGHT= limits are assigned weights linearly graduated between 0 and 1, and values greater than the ZEROWEIGHT= limit are assigned a weight of 0. For example, if ZEROWEIGHT=2 and FULLWEIGHT=1, a value 1.3 standard deviations from the mean would be assigned a graduated weight. The default is FULLWEIGHT=1.5. NDEC= n
specifies the number of decimal places shown on the output tables. This option has no effect on the precision of the variables in the output data set. PRINTOUT= STANDARD PRINTOUT= LONG PRINTOUT= FULL PRINTOUT= NONE
specifies the tables to print. If PRINTOUT=STANDARD is specified, between 17 and 27 tables are printed, depending on the other options that are specified. PRINTOUT=LONG prints between 27 and 39 tables, and PRINTOUT=FULL prints between 44 and 59 tables. Specifying PRINTOUT=NONE results in no tables being printed. The default is PRINTOUT=STANDARD. The TABLES statement can also specify particular quarterly tables to be printed. If no PRINTOUT= is given, and a TABLES statement is given, the TABLES statement overrides the default value of PRINTOUT=STANDARD; that is, no tables (or charts) are printed except those given in the TABLES statement. However, if both the PRINTOUT= option and a TABLES statement are given, the tables corresponding to the PRINTOUT= option and those requested by the TABLES statement are printed. START= ’yyyyQq’
adjusts only the part of the input series starting with the quarter and year given. When the DATE= option is not used, the START= option gives the year and quarter of the first input observation (for example, START=’1967Q1’). The specification must be enclosed in quotes, and q must be 1, 2, 3, or 4. START= must be specified if the DATE= option is not given. If START= is specified (and no DATE= is given), and an OUTPUT= data set is requested, a variable named _DATE_ is added to the data set, giving the date value for a given observation. See the DATE= option for using the START= and END= options to subset a series. SUMMARY
specifies that the input is already seasonally adjusted and that the procedure is to produce summary measures. If this option is omitted, the procedure performs seasonal adjustment of the input data before calculating summary measures. TRENDADJ
modifies extreme irregular values prior to computing the trend cycle estimates. If this option is omitted, the trend cycle is computed without modification for extremes. ZEROWEIGHT= value
assigns weights to irregular values based on their distance from the mean in standard deviation units. The weights are used for estimating seasonal and trend cycle components. Irregular
SSPAN Statement F 2249
values beyond the standard deviation limit specified in the ZEROWEIGHT= option are assigned zero weights. Values that fall between the two limits (ZEROWEIGHT= and FULLWEIGHT=) are assigned weights linearly graduated between 0 and 1. For example, if ZEROWEIGHT=2 and FULLWEIGHT=1, a value 1.3 standard deviations from the mean would be assigned a graduated weight. The default is ZEROWEIGHT=2.5. The ZEROWEIGHT option can be used in conjunction with the FULLWEIGHT= option to adjust outliers from a monthly or quarterly series. See Example 33.3 later in this chapter for an illustration of this use.
SSPAN Statement SSPAN options ;
The SSPAN statement applies sliding spans analysis to determine the suitability of seasonal adjustment for an economic series. The following options can appear in the SSPAN statement: NDEC= n
specifies the number of decimal places shown on selected sliding span reports. This option has no effect on the precision of the variables values in the OUTSPAN output data set. CUTOFF= value
gives the percentage value for determining an excessive difference within a span for the seasonal factors, the seasonally adjusted series, and month-to-month and year-to-year differences in the seasonally adjusted series. The default value is 3.0. The use of the CUTOFF=value in determining the maximum percent difference (MPD) is described in the section “Computational Details for Sliding Spans Analysis” on page 2256. Caution should be used in changing the default CUTOFF=value. The empirical threshold ranges found by the U.S. Census Bureau no longer apply when value is changed. TDCUTOFF= value
gives the percentage value for determining an excessive difference within a span for the trading-day factors. The default value is 2.0. The use of the TDCUTOFF=value in determining the maximum percent difference (MPD) is described in the section “Computational Details for Sliding Spans Analysis” on page 2256. Caution should be used in changing the default TDCUTOFF=value. The empirical threshold ranges found by the U.S. Census Bureau no longer apply when the value is changed. NOPRINT
suppresses all sliding span reports. See “Computational Details for Sliding Spans Analysis” on page 2256 for more details on sliding span reports. PRINT
prints the summary sliding span reports S 0 through S 6.E.
2250 F Chapter 33: The X11 Procedure
PRINTALL
prints the summary sliding spans report S 0 through S 6.E, along with detail reports S 7.A through S 7.E.
TABLES Statement TABLES tablenames ;
The TABLES statement prints the tables specified in addition to the tables that are printed as a result of the PRINTOUT= option in the MONTHLY or QUARTERLY statement. Table names are listed in Table 33.4 later in this chapter. To print only selected tables, omit the PRINTOUT= option in the MONTHLY or QUARTERLY statement and list the tables to be printed in the TABLES statement. For example, to print only the final seasonal factors and final seasonally adjusted series, use the statement tables d10 d11;
VAR Statement VAR variables ;
The VAR statement is used to specify the variables in the input data set that are to be analyzed by the procedure. Only numeric variables can be specified. If the VAR statement is omitted, all numeric variables are analyzed except those appearing in a BY or ID statement or the variable named in the DATE= option in the MONTHLY or QUARTERLY statement.
Details: X11 Procedure
Historical Development of X-11 This section briefly describes the historical development of the standard X-11 seasonal adjustment method and the later development of the X-11-ARIMA method. Most of the following discussion is based on a comprehensive article by Bell and Hillmer (1984), which describes the history of X-11 and the justification of using seasonal adjustment methods, such as X-11, given the current availability of time series software. For further discussions about statistical problems associated with the X-11 method, see Ghysels (1990).
Historical Development of X-11 F 2251
Seasonal adjustment methods began to be developed in the 1920s and 1930s, before there were suitable analytic models available and before electronic computing devices were in existence. The lack of any suitable model led to methods that worked the same for any series — that is, methods that were not model-based and that could be applied to any series. Experience with economic series had shown that a given mathematical form could adequately represent a time series only for a fixed length; as more data were added, the model became inadequate. This suggested an approach that used moving averages. For further analysis of the properties of X-11 moving averages, see Cleveland and Tiao (1976). The basic method was to break up an economic time series into long-term trend, long-term cyclical movements, seasonal movements, and irregular fluctuations. Early investigators found that it was not possible to uniquely decompose the trend and cycle components. Thus, these two were grouped together; the resulting component is usually referred to as the “trend cycle component.” It was also found that estimating seasonal components in the presence of trend produced biased estimates of the seasonal components, but, at the same time, estimating trend in the presence of seasonality was difficult. This eventually lead to the iterative approach used in the X-11 method. Two other problems were encountered by early investigators. First, some economic series appear to have changing or evolving seasonality. Secondly, moving averages were very sensitive to extreme values. The estimation method used in the X-11 method allows for evolving seasonal components. For the second problem, the X-11 method uses repeated adjustment of extreme values. All of these problems encountered in the early investigation of seasonal adjustment methods suggested the use of moving averages in estimating components. Even with the use of moving averages instead of a model-based method, massive amounts of hand calculations were required. Only a small number of series could be adjusted, and little experimentation could be done to evaluate variations on the method. With the advent of electronic computing in the 1950s, work on seasonal adjustment methods proceeded rapidly. These methods still used the framework previously described; variants of these basic methods could now be easily tested against a large number of series. Much of the work was done by Julian Shiskin and others at the U.S. Bureau of the Census beginning in 1954 and culminating after a number of variants into the X-11 Variant of the Census Method II Seasonal Adjustment Program, which PROC X11 implements. References for this work during this period include Shiskin and Eisenpress (1957), Shiskin (1958), and Marris (1961). The authoritative documentation for the X-11 Variant is in Shiskin, Young, and Musgrave (1967). This document is not equivalent to a program specification; however, the FORTRAN code that implements the X-11 Variant is in the public domain. A less detailed description of the X-11 Variant is given in U.S. Bureau of the Census (1969).
Development of the X-11-ARIMA Method The X-11 method uses symmetric moving averages in estimating the various components. At the end of the series, however, these symmetric weights cannot be applied. Either asymmetric weights have to be used, or some method of extending the series must be found.
2252 F Chapter 33: The X11 Procedure
While various methods of extending a series have been proposed, the most important method to date has been the X-11-ARIMA method developed at Statistics Canada. This method uses Box-Jenkins ARIMA models to extend the series. The Time Series Research and Analysis Division of Statistics Canada investigated 174 Canadian economic series and found five ARIMA models out of twelve that fit the majority of series well and reduced revisions for the most recent months. References that give details of various aspects of the X-11-ARIMA methodology include Dagum (1980, 1982a, c, 1983, 1988), Laniel (1985), Lothian and Morry (1978a), and Huot et al. (1986).
Differences between X11ARIMA/88 and PROC X11 The original implementation of the X-11-ARIMA method was by Statistics Canada in 1980 (Dagum 1980), with later changes and enhancements made in 1988 (Dagum 1988). The calculations performed by PROC X11 differ from those in X11ARIMA/88, which will result in differences in the final component estimates provided by these implementations. There are three areas where Statistics Canada made changes to the original X-11 seasonal adjustment method in developing X11ARIMA/80 (Monsell 1984). These are (a) selection of extreme values, (b) replacement of extreme values, and (c) generation of seasonal and trend cycle weights. These changes have not been implemented in the current version of PROC X11. Thus the procedure produces results identical to those from previous versions of PROC X11 in the absence of an ARIMA statement. Additional differences can result from the ARIMA estimation. X11ARIMA/88 uses conditional least squares (CLS), while CLS, unconditional least squares (ULS) and maximum likelihood (ML) are all available in PROC X11 by using the METHOD= option in the ARIMA statement. Generally, parameters estimates will differ for the different methods.
Implementation of the X-11 Seasonal Adjustment Method The following steps describe the analysis of a monthly time series using multiplicative seasonal adjustment. Additional steps used by the X-11-ARIMA method are also indicated. Equivalent descriptions apply for an additive model if you replace divide with subtract where applicable. In the multiplicative adjustment, the original series Ot is assumed to be of the form Ot D Ct St It Pt Dt where Ct is the trend cycle component, St is the seasonal component, It is the irregular component, Pt is the prior monthly factors component, and Dt is the trading-day component. The trading-day component can be further factored as Dt D Dr;t Dt r;t ;
Implementation of the X-11 Seasonal Adjustment Method F 2253
where Dt r;t are the trading-day factors derived from the prior daily weights, and Dr;t are the residual trading-day factors estimated from the trading-day regression. For further information about estimating trading day variation, see Young (1965).
Additional Steps When Using the X-11-ARIMA Method The X-11-ARIMA method consists of extending a given series by an ARIMA model and applying the usual X-11 seasonal adjustment method to this extended series. Thus in the simplest case in which there are no prior factors or calendar effects in the series, the ARIMA model selection, estimation, and forecasting are performed first, and the resulting extended series goes through the standard X-11 steps described in the next section. If prior factor or calendar effects are present, they must be eliminated from the series before the ARIMA estimation is done because these effects are not stochastic. Prior factors, if present, are removed first. Calendar effects represented by prior daily weights are then removed. If there are no further calendar effects, the adjusted series is extended by the ARIMA model, and this extended series goes through the standard X-11 steps without repeating the removal of prior factors and calendar effects from prior daily weights. If further calendar effects are present, a trading-day regression must be performed. In this case it is necessary to go through an initial pass of the X-11 steps to obtain a final trading-day adjustment. In this initial pass, the series, adjusted for prior factors and prior daily weights, goes through the standard X-11 steps. At the conclusion of these steps, a final series adjusted for prior factors and all calendar effects is available. This adjusted series is then extended by the ARIMA model, and this extended series goes through the standard X-11 steps again, without repeating the removal of prior factors and calendar effects from prior daily weights and trading-day regression.
The Standard X-11 Seasonal Adjustment Method The standard X-11 seasonal adjustment method consists of the following steps. These steps are applied to the original data or the original data extended by an ARIMA model. 1. In step 1, the data are read, ignoring missing values until the first nonmissing value is found. If prior monthly factors are present, the procedure reads prior monthly Pt factors and divides them into the original series to obtain Ot =Pt D Ct St It Dt r;t Dr;t . Seven daily weights can be specified to develop monthly factors to adjust the series for tradingday variation, Dt r;t ; these factors are then divided into the original or prior adjusted series to obtain Ct St It Dr;t . 2. In steps 2, 3, and 4, three iterations are performed, each of which provides estimates of the seasonal St , trading-day Dr;t , trend cycle Ct , and irregular components It . Each iteration refines estimates of the extreme values in the irregular components. After extreme values are identified and modified, final estimates of the seasonal component, seasonally adjusted series, trend cycle, and irregular components are produced. Step 2 consists of three substeps:
2254 F Chapter 33: The X11 Procedure
a) During the first iteration, a centered, 12-term moving average is applied to the original series Ot to provide a preliminary estimate CO t of the trend cycle curve Ct . This moving average combines 13 (a 2-term moving average of a 12-term moving average) consecutive monthly values, removing the St and It . Next, it obtains a preliminary estimate St It by
b
b
St It D
Ot CO t
b
b) A moving average is then applied to the St It to obtain an estimate SOt of the seasonal factors. St It is then divided by this estimate to obtain an estimate IOt of the irregular component. Next, a moving standard deviation is calculated from the irregular component and is used in assigning a weight to each monthly value for measuring its degree of extremeness. These weights are used to modify extreme values in St It . New seasonal factors are estimated by applying a moving average to the modified value of St It . A preliminary seasonally adjusted series is obtained by dividing the original series by these new seasonal factors. A second estimate of the trend cycle is obtained by applying a weighted moving average to this seasonally adjusted series.
b
b
b
c) The same process is used to obtain second estimates of the seasonally adjusted series and improved estimates of the irregular component. This irregular component is again modified for extreme values and then used to provide estimates of trading-day factors and refined weights for the identification of extreme values. 3. Using the same computations, a second iteration is performed on the original series that has been adjusted by the trading-day factors and irregular weights developed in the first iteration. The second iteration produces final estimates of the trading-day factors and irregular weights. 4. A third and final iteration is performed using the original series that has been adjusted for trading-day factors and irregular weights computed during the second iteration. During the third iteration, PROC X11 develops final estimates of seasonal factors, the seasonally adjusted series, the trend cycle, and the irregular components. The procedure computes summary measures of variation and produces a moving average of the final adjusted series.
Sliding Spans Analysis The motivation for sliding spans analysis is to answer the question, When is a economic series unsuitable for seasonal adjustment? There have been a number of past attempts to answer this question: stable seasonality F test; moving seasonality F test, Q statistics, and others. Sliding spans analysis attempts to quantify the stability of the seasonal adjustment process, and hence quantify the suitability of seasonal adjustment for a given series. It is based on a very simple idea: for a stable series, deleting a small number of observations should not result in greatly different component estimates compared with the original, full series. Conversely, if deleting a small number of observations results in drastically different estimates, the series is unstable. For example, a drastic difference in the seasonal factors (Table D10) might result from a dominating irregular component or sudden changes in the seasonally component. When the seasonal component estimates of a series is unstable in this manner, they have little meaning and the series is likely to be unsuitable for seasonal adjustment.
Implementation of the X-11 Seasonal Adjustment Method F 2255
Sliding spans analysis, developed at the Statistical Research Division of the U.S. Census Bureau (Findley et al. 1990; Findley and Monsell 1986), performs a repeated seasonal adjustment on subsets or spans of the full series. In particular, an initial span of the data, typically eight years in length, is seasonally adjusted, and the Tables C18, the trading-day factors (if trading-day regression performed), D10, the seasonal factors, and D11, the seasonally adjusted series are retained for further processing. Next, one year of data is deleted from the beginning of the initial span and one year of data is added. This new span is seasonally adjusted as before, with the same tables retained. This process continues until the end of the data is reached. The beginning and ending dates of the spans are such that the last observation in the original data is also the last observation in the last span. This is discussed in more detail in the following paragraphs. The following notation for the components or differences computed in the sliding spans analysis follows Findley et al. (1990). The meaning for the symbol Xt .k/ is component X in month (or quarter) t , computed from data in the kth span. These components are now defined. Seasonal Factors (Table D10): St .k/ Trading-Day Factors (Table C18): TDt .k/ Seasonally Adjusted Data (Table D11): SAt .k/ Month-to-Month Changes in the Seasonally Adjusted Data: MMt .k/ Year-to-Year Changes in the Seasonally Adjusted Data: Y Yt .k/ The key measure is the maximum percent difference across spans. For example, consider a series that begins in January 1972, ends in December 1984, and has four spans, each of length 8 years (see Figure 1 in Findley et al. (1990), p. 346). Consider St .k/ the seasonal factor (Table D10) for month t for span k, and let Nt denote the number of spans containing month t ; that is, Nt D fk W span k cont ai ns month t g In the middle years of the series there is overlap of all four spans, and Nt will be 4. The last year of the series will have only one span, while the beginning can have 1 or 0 spans depending on the original length. Since we are interested in how much the seasonal factors vary for a given month across the spans, a natural quantity to consider is maxkNt St .k/
mi nkNt St .k/
In the case of the multiplicative model, it is useful to compute a percentage difference; define the maximum percentage difference (MPD) at time t as MPDt D
maxkNt St .k/ mi nkNt St .k/ mi nkNt St .k/
The seasonal factor for month t is then unreliable if MPDt is large. While no exact significance level can be computed for this statistic, empirical levels have been established by considering over
2256 F Chapter 33: The X11 Procedure
500 economic series (Findley et al. 1990; Findley and Monsell 1986). For these series it was found that for four spans, stable series typically had less than 15% of the MPD values exceeding 3.0%, while in marginally stable series, between 15% and 25% of the MPD values exceeded 3.0%. A series in which 25% or more of the MPD values exceeded 3.0% is almost always unstable. While these empirical values cannot be considered an exact significance level, they provide a useful empirical basis for deciding if a series is suitable for seasonal adjustment. These percentage values are shifted down when fewer than four spans are used.
Computational Details for Sliding Spans Analysis Length and Number of Spans The algorithm for determining the length and number of spans for a given series was developed at the U.S. Bureau of the Census, Statistical Research Division. A summary of this algorithm is as follows. First, an initial length based on the MACURVE month=option specification is determined, and then the maximum number of spans possible using this length is determined. If this maximum number exceeds four, set the number of spans to four. If this maximum number is one or zero, there are not enough observations to perform the sliding spans analysis. In this case a note is written to the log and the sliding spans analysis is skipped for this variable. If the maximum number of spans is two or three, the actual number of spans used is set equal to this maximum. Finally, the length is adjusted so that the spans begin in January (or the first quarter) of the beginning year of the span. The remainder of this section gives the computation formulas for the maximum percentage difference (MPD) calculations along with the threshold regions.
Seasonal Factors (Table D10) For the additive model, the MPD is defined as maxkNt St .k/
mi nkNt St .k/
For the multiplicative model, the MPD is MPDt D
maxkNt St .k/ mi nkNt St .k/ mi nkNt St .k/
A series for which less than 15% of the MPD values of D10 exceed 3.0% is stable; between 15% and 25% is marginally stable; and greater than 25% is unstable. Span reports S 2.A through S 2.C give the various breakdowns for the number of times the MPD exceeded these levels.
Computational Details for Sliding Spans Analysis F 2257
Trading Day Factor (Table C18) For the additive model, the MPD is defined as maxkNt TDt .k/
mi nkNt TDt .k/
For the multiplicative model, the MPD is MPDt D
maxkNt TDt .k/ mi nkNt TDt .k/ mi nkNt TDt .k/
The U.S. Census Bureau currently gives no recommendation concerning MPD thresholds for the trading-day factors. Span reports S 3.A through S 3.C give the various breakdowns for MPD thresholds. When TDREGR=NONE is specified, no trading-day computations are done, and this table is skipped.
Seasonally Adjusted Data (Table D11) For the additive model, the MPD is defined as maxkNt SAt .k/
mi nkNt SAt .k/
For the multiplicative model, the MPD is MPDt D
maxkNt SAt .k/ mi nkNt SAt .k/ mi nkNt SAt .k/
A series for which less than 15% of the MPD values of D11 exceed 3.0% is stable; between 15% and 25% is marginally stable; and greater than 25% is unstable. Span reports S 4.A through S 4.C give the various breakdowns for the number of times the MPD exceeded these levels.
Month-to-Month Changes in the Seasonally Adjusted Data Some additional notation is needed for the month-to-month and year-to-year differences. Define N1t as N1t D fk W span k cont ai ns month t and t
1g
For the additive model, the month-to-month change for span k is defined as MMt .k/ D SAt
SAt
1
while for the multiplicative model MMt .k/ D
SAt SAt SAt 1
1
2258 F Chapter 33: The X11 Procedure
Since this quantity is already in percentage form, the MPD for both the additive and multiplicative model is defined as MPDt D maxkN1t MMt .k/
mi nkN1t MMt .k/
The current recommendation of the U.S. Census Bureau is that if 35% or more of the MPD values of the month-to-month differences of D11 exceed 3.0%, then the series is usually not stable; 40% exceeding this level clearly marks an unstable series. Span reports S 5.A.1 through S 5.C give the various breakdowns for the number of times the MPD exceeds these levels. Year-to-Year Changes in the Seasonally Adjusted Data
First define N12t as N12t D fk W span k cont ai ns month t and t
12g
(Appropriate changes in notation for a quarterly series are obvious.) For the additive model, the month-to-month change for span k is defined as Y Yt .k/ D SAt
SAt
12
while for the multiplicative model Y Yt .k/ D
SAt SAt SAt 12
12
Since this quantity is already in percentage form, the MPD for both the additive and multiplicative model is defined as MPDt D maxkN1t Y Yt .k/
mi nkN1t Y Yt .k/
The current recommendation of the U.S. Census Bureau is that if 10% or more of the MPD values of the month-to-month differences of D11 exceed 3.0%, then the series is usually not stable. Span reports S 6.A through S 6.C give the various breakdowns for the number of times the MPD exceeds these levels.
Data Requirements The input data set must contain either quarterly or monthly time series, and the data must be in chronological order. For the standard X-11 method, there must be at least three years of observations (12 for quarterly time series or 36 for monthly) in the input data sets or in each BY group in the input data set if a BY statement is used. For the X-11-ARIMA method, there must be at least five years of observations (20 for quarterly time series or 60 for monthly) in the input data sets or in each BY group in the input data set if a BY statement is used.
Missing Values F 2259
Missing Values Missing values at the beginning of a series to be adjusted are skipped. Processing starts with the first nonmissing value and continues until the end of the series or until another missing value is found. Missing values are not allowed for the DATE= variable. The procedure terminates if missing values are found for this variable. Missing values found in the PMFACTOR= variable are replaced by 100 for the multiplicative model (default) and by 0 for the additive model. Missing values can occur in the output data set. If the time series specified in the OUTPUT statement is not computed by the procedure, the values of the corresponding variable are missing. If the time series specified in the OUTPUT statement is a moving average, the values of the corresponding variable are missing for the first n and last n observations, where n depends on the length of the moving average. Additionally, if the time series specified is an irregular component modified for extremes, only the modified values are given, and the remaining values are missing.
Prior Daily Weights and Trading-Day Regression Suppose that a detailed examination of retail sales at ZXY Company indicates that certain days of the week have higher amounts of sales. In particular, Thursday, Friday, and Saturday have approximately twice the amount of sales as Monday, Tuesday, and Wednesday, and no sales occur on Sunday. This means that months with five Saturdays would have higher amounts of sales than months with only four Saturdays. This phenomenon is called a calendar effect; it can be handled in PROC X11 by using the PDWEIGHTS (prior daily weights) statement or the TDREGR=option (trading-day regression). The PDWEIGHTS statement and the TDREGR=option can be used separately or together. If the relative weights are known (as in the preceding) it is appropriate to use the PDWEIGHTS statement. If further residual calendar variation is present, TDREGR=ADJUST should also be used. If you know that a calendar effect is present, but know nothing about the relative weights, use TDREGR=ADJUST without a PDWEIGHTS statement. In this example, it is assumed that the calendar variation is due to both prior daily weights and residual variation. Thus both a PDWEIGHTS statement and TDREGR=ADJUST are specified. Note that only the relative weights are needed; in the actual computations, PROC X11 normalizes the weights to sum to 7.0. If a day of the week is not present in the PDWEIGHTS statement, it is given a value of zero. Thus “sun=0” is not needed. proc x11 data=sales; monthly date=date tdregr=adjust; var sales; tables a1 a4 b15 b16 C14 C15 c18 d11; pdweights mon=1 tue=1 wed=1 thu=2 fri=2 sat=2;
2260 F Chapter 33: The X11 Procedure
output out=x11out a1=a1 a4=a4 b1=b1 c14=c14 c16=c16 c18=c18 d11=d11; run;
Tables of interest include A1, A4, B15, B16, C14, C15, C18, and D11. Table A4 contains the adjustment factors derived from the prior daily weights; Table C14 contains the extreme irregular values excluded from trading-day regression; Table C15 contains the trading-day-regression results; Table C16 contains the monthly factors derived from the trading-day regression; and Table C18 contains the final trading-day factors derived from the combined daily weights. Finally, Table D11 contains the final seasonally adjusted series.
Adjustment for Prior Factors Suppose now that a strike at ZXY Company during July and August of 1988 caused sales to decrease an estimated 50%. Since this is a one-time event with a known cause, it is appropriate to prior adjust the data to reflect the effects of the strike. This is done in PROC X11 through the use of PMFACTOR=varname (prior monthly factor) in the MONTHLY statement. In the following example, the PMFACTOR variable is named PMF. Since the estimate of the decrease in sales is 50%, PMF has a value of 50.0 for the observations corresponding to July and August 1988, and a value of 100.0 for the remaining observations. This prior adjustment on SALES is performed by replacing SALES with the calculated value (SALES/PMF) * 100.0. A value of 100.0 for PMF leaves SALES unchanged, while a value of 50.0 for PMF doubles SALES. This value is the estimate of what SALES would have been without the strike. The following example shows how this prior adjustment is accomplished. data sales2; set sales; if '01jul1988'd > PLOTS< (global-plot-options) > < = (plot-request < (options) > < ... plot-request < (options) > >) >
controls the plots that are produced through ODS Graphics. When you specify only one plot request, you can omit the parentheses around the plot request. Some examples of the PLOTS= option are shown below: plots=none plots=all plots=residual(none) plots(only)=(series(acf pacf) residual(hist))
You must enable ODS Graphics before requesting plots as shown in the following statements. ods graphics on; proc x12 data=sales date=date; var sales; identify diff=(0,1) sdiff=(0,1); run;
Since no specific plot is requested in this program, the default plots associated with the PROC X12 and IDENTIFY statements are produced. For general information about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS” (SAS/STAT User’s Guide). If you have enabled ODS Graphics but do not specify any specific plot request, then the default plots that are associated with each of the PROC X12 statements used in the program are produced. Line printer plots are suppressed when ODS Graphics is enabled. If NONE is specified in an option, then no plots are produced for that option. If ALL is specified without NONE in an option, then all plots are produced for that option. Global Plot Options: The global-plot-options apply to all relevant plots that are generated by the X12 procedure. The following global-plot-option is supported: ONLY
suppresses the default plots. Only the plots specifically requested are produced.
PROC X12 Statement F 2309
Specific Plot Options: The following list describes the specific plots and their options: ALL
produces all plots that are appropriate for the particular analysis. NONE
suppresses all plots. SERIES(< series-plot-options > )
produces plots that are associated with the identification stage of the modeling. The ACF, PACF, and SPECTRUM plots are produced by default. The following series-plot-options are available: ACF
produces the plot of autocorrelations. ALL
produces all the plots that are associated with the identification stage. NONE
suppresses all plots that are associated with the identification stage. PACF
produces the plot of partial-autocorrelations. SPECTRUM
produces the spectral plot of Table G0. Table G0 is calculated based on either Table A1, A19, B1 or E1, as specified by the SPECTRUMSERIES= option. The original data is first-differenced and transformed as specified in the TRANSFORM statement. By default, the type of spectral estimate that is used to calculate the spectral plot is the SPECTRUM. If the PERIODGRAM option is specified, then the periodogram of the series is used to calculate the spectral plot. RESIDUAL(< residual-plot-options > )
produces the regARIMA model residual series plots if the CHECK statement is specified. The ACF, PACF, HIST, SQACF, and SPECTRUM plots are produced by default. The following residual-plot-options are available: ACF
produces the plot of residual autocorrelations. ALL
produces all the residual diagnostics plots that are appropriate for the particular analysis. HIST
produces the histogram of the residuals and also the residual outliers and residual statistics tables that describe the residual histogram.
2310 F Chapter 34: The X12 Procedure
NONE
suppresses all the residual diagnostics plots. PACF
produces the plot of residual partial-autocorrelations if PRINT=PACF is specified in the CHECK statement. SPECTRUM
produces the spectral plot of Table GRs. Table GRs is calculated based on the regARIMA model residual series. By default, the type of spectral estimate used to calculate the spectral plot is the SPECTRUM. If the PERIODGRAM option is specified, then the periodogram of the series is used to calculate the spectral plot. SQACF
produces the plot of squared residual autocorrelations. SA(< sa-plot-options > ) ADJUSTED(< sa-plot-options > )
produces the seasonally adjusted series plots in the X11 statement. The SPECTRUM plot is produced by default. The following sa-plot-options are available: ALL
produces all seasonally adjusted plots. NONE
suppresses all seasonally adjusted plots. SPECTRUM
produces the spectral plot of Table G1. Table G1 is calculated based on the modified seasonally adjusted series (Table E2). The data is first-differenced and transformed as specified in the TRANSFORM statement. By default, the type of spectral estimate used to calculate the spectral plot is the SPECTRUM. If the PERIODGRAM option is specified, then the periodogram of the series is used to calculate the spectral plot. IC(< ic-plot-options > ) IRREGULAR(< ic-plot-options > )
produces the irregular series plots in the X11 statement. The SPECTRUM plot is produced by default. The following ic-plot-options are available: ALL
produces all irregular plots. NONE
suppresses all irregular plots.
BY Statement F 2311
SPECTRUM
produces the spectral plot of Table G2. Table G2 is calculated based on the modified irregular series (Table E3). The data is first-differenced and transformed as specified in the TRANSFORM statement. By default, the type of spectral estimate used to calculate the spectral plot is the SPECTRUM. If the PERIODGRAM option is specified, then the periodogram of the series is used to calculate the spectral plot.
BY Statement BY variables ;
A BY statement can be used with PROC X12 to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input DATA= data set to be sorted in order of the BY variables.
ID Statement ID variables ;
If you are creating an output data set, use the ID statement to copy values of the ID variables, in addition to the table values, into the output data set. Or, if the VAR statement is omitted, all numeric variables that are not identified as BY variables, ID variables, the DATE= variable, or user-defined regressors are processed as time series. The ID statement has no effect when a VAR statement is specified and an output data set is not created. If the DATE= variable is specified in the PROC X12 statement, this variable is included automatically in the OUTPUT data set. If no DATE= variable is specified, the variable _DATE_ is added. The date variable (or _DATE_ ) values outside the range of the actual data (from forecasting) are extrapolated, while all other ID variables are missing in the forecast horizon.
EVENT Statement EVENT variables < / options > ;
The EVENT statement specifies EVENTs to be included in the regression portion of the regARIMA model. Multiple EVENT statements can be specified. If a MDLINFOIN= data set is not specified, then all variables specified in the EVENT statements are applied to all BY-groups and all time series that are processed. If a MDLINFOIN= data set is specified, then the EVENT statements apply only if no regression information for the BY-group and series is available in the MDLINFOIN= data set. The EVENTs specified in the EVENT statements either must be SAS predefined EVENTs or must be defined in the data set specified in the INEVENT=SAS-data-set option of the PROC X12 statement.
2312 F Chapter 34: The X12 Procedure
For a list of SAS predefined EVENTs, see the section “EVENTKEY Statement” in Chapter 6, “The HPFEVENTS Procedure” (SAS High-Performance Forecasting User’s Guide). The EVENT statement can also be used to include outlier, level shift, and temporary change regressors that are available as predefined U.S. Census Bureau variables in the X-12-ARIMA program. For example, the following statements specify an additive outlier in January 1970 and a level shift that begins in July 1971: proc x12 data=ICMETI seasons=12 start=jan1968; event AO01JAN1970D CBLS01JUL1971D;
and the following statements specify an additive outlier in the second quarter 1970 and a temporary change that begins in the fourth quarter 1971: proc x12 data=ICMETI seasons=4 start='1970q1'; event AO01APR1970D TC01OCT1971D;
The following options can appear in the EVENT statement: B=(value < F > . . . )
specifies initial or fixed values for the EVENT parameters. For details about the B= option, see B=(value . . . ) in the section “REGRESSION Statement” on page 2326. USERTYPE=AO USERTYPE=CONSTANT USERTYPE=EASTER USERTYPE=HOLIDAY USERTYPE=LABOR USERTYPE=LOM USERTYPE=LOMSTOCK USERTYPE=LOQ USERTYPE=LPYEAR USERTYPE=LS USERTYPE=RP USERTYPE=SCEASTER USERTYPE=SEASONAL USERTYPE=TC USERTYPE=TD USERTYPE=TDSTOCK USERTYPE=THANKS USERTYPE=USER
For details about the USERTYPE= option, see the USERTYPE= option in the section “REGRESSION Statement” on page 2326.
INPUT Statement F 2313
INPUT Statement INPUT variables < / options > ;
The INPUT statement specifies variables in the PROC X12 DATA= or AUXDATA= data set that are to be used as regressors in the regression portion of the regARIMA model. The variables in the data set should contain the values for each observation that define the regressor. Future values of regression variables should also be included in the DATA= data set if the time series listed in the VAR statement is to be extended with regARIMA forecasts. Multiple INPUT statements can be specified. If a MDLINFOIN= data set is not specified, then all variables listed in the INPUT statements are applied to all BY-groups and all time series that are processed. If a MDLINFOIN= data set is specified, then the INPUT statements apply only if no regression information for the BY-group and series is available in the MDLINFOIN= data set. The following options can appear in the INPUT statement: B=(value . . . )
specifies initial or fixed values for the INPUT variable parameters. For details about the B= option, see the B=(value . . . ) option in the section “REGRESSION Statement” on page 2326. USERTYPE=AO USERTYPE=CONSTANT USERTYPE=EASTER USERTYPE=HOLIDAY USERTYPE=LABOR USERTYPE=LOM USERTYPE=LOMSTOCK USERTYPE=LOQ USERTYPE=LPYEAR USERTYPE=LS USERTYPE=RP USERTYPE=SCEASTER USERTYPE=SEASONAL USERTYPE=TC USERTYPE=TD USERTYPE=TDSTOCK USERTYPE=THANKS USERTYPE=USER
For details about the USERTYPE= option, see the USERTYPE= option in the section “REGRESSION Statement” on page 2326.
2314 F Chapter 34: The X12 Procedure
ADJUST Statement ADJUST options ;
The ADJUST statement adjusts the series for leap year and length-of-period factors prior to estimating a regARIMA model. The “Prior Adjustment Factors” table is associated with the ADJUST statement. The following option can appear in the ADJUST statement: PREDEFINED=LOM PREDEFINED=LOQ PREDEFINED=LPYEAR
specifies length-of-month adjustment, length-of-quarter adjustment, or leap year adjustment. PREDEFINED=LOM and PREDEFINED=LOQ are equivalent because the actual adjustment is determined by the interval of the time series. Also, because leap year adjustment is a limited form of length-of-period adjustment, only one type of predefined adjustment can be specified. The PREDEFINED= option should not be used in conjunction with PREDEFINED=TD or PREDEFINED=TD1COEF in the REGRESSION statement or MODE=ADD or MODE=PSEUDOADD in the X11 statement. PREDEFINED=LPYEAR cannot be specified unless the series is log transformed. If the series is to be transformed by using a Box-Cox or logistic transformation, the series is first adjusted according to the ADJUST statement, and then it is transformed. In the case of a length-of-month adjustment for the series with observations Yt , each observation is first divided by the number of days in that month, mt , and then multiplied by the average length of month (30.4375), resulting in .30:4375 Yt /=mt . Length-of-quarter adjustments are performed in a similar manner, resulting in .91:3125 Yt /=qt , where qt is the length in days of quarter t. Forecasts of the transformed and adjusted data are transformed and adjusted back to the original scale for output.
ARIMA Statement ARIMA options ;
The ARIMA statement specifies the ARIMA part of the regARIMA model. This statement defines a pure ARIMA model if no REGRESSION statements, INPUT statements, or EVENT statements are specified. The ARIMA part of the model can include multiplicative seasonal factors. The following option can appear in the ARIMA statement: MODEL=((p d q) (P D Q)s)
specifies the ARIMA model. The format follows standard Box-Jenkins notation (Box, Jenkins, and Reinsel 1994). The nonseasonal AR and MA orders are given by p and q, respectively, while the seasonal AR and MA orders are given by P and Q. The number of differences and
CHECK Statement F 2315
seasonal differences are given by d and D, respectively. The notation (p d q) and (P D Q) can also be specified as (p, d, q) and (P, D, Q). The maximum lag of any AR or MA parameter is 36. The maximum value of a difference order, d or D, is 144. All values for p, d, q, P, D, and Q should be nonnegative integers. The seasonality parameter, s, should be a positive integer. If s is omitted, it is set equal to the value that is specified in the SEASONS= option in the PROC X12 statement. For example, the following statements specify an ARIMA (2,1,1)(1,1,0)12 model: proc x12 data=ICMETI seasons=12 start=jan1968; arima model=((2,1,1)(1,1,0));
CHECK Statement CHECK options ;
The CHECK statement produces statistics for diagnostic checking of residuals from the estimated regARIMA model. The following tables that are associated with diagnostic checking are displayed in the output: “Autocorrelation of regARIMA Model Residuals,” “Partial Autocorrelation of regARIMA Model Residuals,” “Autocorrelation of Squared regARIMA Model Residuals,” “Outliers of the Unstandardized Residuals,” “Summary Statistics for the Unstandardized Residuals,” “Normality Statistics for regARIMA Model Residuals,” and “Table G Rs: 10*LOG(SPECTRUM) of the regARIMA Model Residuals.” If ODS GRAPHICS ON is specified, the following plots that are associated with diagnostic checking output are produced: the autocorrelation function (ErrorACF) plot of the residuals, the partial autocorrelation function (ErrorPACF) plot of the residuals, the autocorrelation function (SqErrorACF) plot of the squared residuals, a histogram (ResidualHistogram) of the residuals, and a spectral plot (SpectralPlot) of the residuals. See the PLOTS=RESIDUAL option of the PROC X12 statement for further information about controlling the display of plots. The residual histogram displayed by the X12 procedure shows the distribution of the unstandardized, uncentered regARIMA model residuals; the residual histogram displayed by the U.S. Census Bureau’s X-12-ARIMA seasonal adjustment program displays standardized and mean-centered residuals. The following options can appear in the CHECK statement: MAXLAG=value
specifies the number of lags for the residual sample autocorrelation function (ACF) and partial autocorrelation function (PACF). The default is 36 for monthly series and 12 for quarterly series. The minimum value for MAXLAG= is 1. For the table “Autocorrelation of Squared regARIMA Model Residuals” and the corresponding SqErrorACF plot, the maximum number of lags calculated is 12 for monthly series and 4 for quarterly series. The MAXLAG= option can only reduce the number of lags for this table and plot.
2316 F Chapter 34: The X12 Procedure
PRINT=ACF PRINT=PACF PRINT=ACFSQUARED PRINT=RESIDUALSTATISTICS PRINT=RESIDUALOUTLIER PRINT=NORM PRINT=SPECRESIDUAL PRINT=ALL PRINT=NONE PRINT=(options)
specifies the diagnostic checking tables to be displayed. If the PRINT= option is not specified, the default is equivalent to specifying PRINT=(ACF ACFSQUARED RESIDUALOUTLIER RESIDUALSTATISTICS NORM SPECRESIDUAL). If PRINT=NONE is specified and no other PRINT= option is specified, then none of the tables that are associated with diagnostic checking are displayed. However, PRINT=NONE has no effect if other PRINT= options are specified in the CHECK statement. PRINT=ALL specifies that all tables related to diagnostic checking be displayed. PRINT=ACF displays the table titled “Autocorrelation of regARIMA Model Residuals.” PRINT=PACF displays the table titled “Partial Autocorrelation of regARIMA Model Residuals.” PRINT=ACFSQUARED displays the table titled “Autocorrelation of Squared regARIMA Model Residuals.” PRINT=RESIDUALOUTLIER or PRINT=RESOUTLIER displays the table “Outliers of the Unstandardized Residuals” if the residuals contain outliers. PRINT=RESIDUALSTATISTICS or PRINT=RESSTAT displays the table titled “Summary Statistics for the Unstandardized Residuals.” PRINT=NORM displays the table titled “Normality Statistics for regARIMA Model Residuals”. Measures of normality included in this table are skewness, Geary’s a statistic, and kurtosis.
ESTIMATE Statement ESTIMATE options ;
The ESTIMATE statement estimates the regARIMA model. The regARIMA model is specified by the REGRESSION, INPUT, EVENT, and ARIMA statements or by the MDLINFOIN= data set. Estimation output includes point estimates and standard errors for all estimated AR, MA, and regression parameters; the maximum likelihood estimate of the variance 2 ; t statistics for individual regression parameters; 2 statistics for assessing the joint significance of the parameters associated with certain regression effects (if included in the model); and likelihood-based model selection
ESTIMATE Statement F 2317
statistics (if the exact likelihood function is used). The regression effects for which 2 statistics are produced are fixed seasonal effects. Tables displayed in the output associated with estimation are “Exact ARMA Likelihood Estimation Iteration Tolerances,” “Average Absolute Percentage Error in within-Sample Forecasts,” “ARMA Iteration History,” “AR/MA Roots,” “Exact ARMA Likelihood Estimation Iteration Summary,” “Regression Model Parameter Estimates,” “ Chi-Squared Tests for Groups of Regressors,” “Exact ARMA Maximum Likelihood Estimation,” and “Estimation Summary.” The following options can appear in the ESTIMATE statement: MAXITER=value
specifies the maximum number of iterations used in estimating the AR and MA parameters. For models with regression variables, this limit applies to the total number of ARMA iterations over all iterations of the iterative generalized least squares (IGLS) algorithm. For models without regression variables, this is the maximum number of iterations allowed for the set of ARMA iterations. The default is MAXITER=200. TOL=value
specifies the convergence tolerance for the nonlinear estimation. Absolute changes in the loglikelihood are compared to the TOL= value to check convergence of the estimation iterations. For models with regression variables, the TOL= value is used to check convergence of the IGLS iterations (where the regression parameters are reestimated for each new set of AR and MA parameters). For models without regression variables, there are no IGLS iterations, and the TOL= value is then used to check convergence of the nonlinear iterations used to estimate the AR and MA parameters. The default value is TOL=0.00001. The minimum tolerance value is a positive value based on the machine precision and the length of the series. If a tolerance less than the minimum supported value is specified, an error message is displayed and the series is not processed. ITPRINT
specifies that the “Iteration History” table be displayed. This table includes detailed output for estimation iterations, including log-likelihood values, parameters, counts of function evaluations, and iterations. It is useful to examine the “Iteration History” table when errors occur within estimation iterations. By default, only successful iterations are displayed, unless the PRINTERR option is specified. An unsuccessful iteration is an iteration that is restarted due to a problem such as a root inside the unit circle. Successful iterations have a status of 0. If restarted iterations are displayed, a note at the end of the table gives definitions for status codes that indicate a restarted iteration. For restarted iterations, the number of function evaluations and the number of iterations will be –1, which is displayed as missing. If regression parameters are included in the model, then both IGLS and ARMA iterations are included in the table. The number of function evaluations is a cumulative total. PRINTERR
causes restarted iterations to be included in the “Iteration History” table if ITPRINT is specified or creates the “Restarted Iterations” table if ITPRINT is not specified. Whether or not PRINTERR is specified, a WARNING message is printed to the log file if any iteration is restarted during estimation.
2318 F Chapter 34: The X12 Procedure
FORECAST Statement FORECAST options ;
The FORECAST statement uses the estimated model to forecast the time series. The output contains point forecasts and forecast statistics for the transformed and original series. The following option can appear in the FORECAST statement: LEAD=value
specifies the number of periods ahead to forecast for regARIMA extension of the series. The default is the number of periods in a year (4 or 12), and the maximum is 60. Setting LEAD=0 specifies that the series not be extended by forecasts. The LEAD= value also controls the number of forecasts that are displayed in Table D10.A. However, if the series is not extended by forecasts (LEAD=0), then the default year of forecasts is displayed in Table D10.A. Note that forecast values in Table D10.A are calculated using the method shown on page 148 of Ladiray and Quenneville (2001) based on values that are displayed in Table D10. The regARIMA forecasts affect the D10.A forecasts only indirectly through the impact of the regARIMA forecasts on the seasonal factors that are shown in Table D10. Tables that contain forecasts, standard errors, and confidence limits are displayed in association with the FORECAST statement. If the data is transformed, then two tables are displayed: one table for the original data, and one table for the transformed data.
IDENTIFY Statement IDENTIFY options ;
The IDENTIFY statement is used to produce plots of the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) for identifying the ARIMA part of a regARIMA model. The sample ACF and PACF are produced for all combinations of the nonseasonal and seasonal differences of the data specified by the DIFF= and SDIFF= options. The original series is first transformed as specified in the TRANSFORM statement. If the model includes a regression component (specified using the REGRESSION, INPUT, and EVENT statements or the MDLINFOIN= data set), both the transformed series and the regressors are differenced at the highest order that is specified in the DIFF= and SDIFF= option. The parameter estimates are calculated using the differenced data. Then the undifferenced regression effects (with the exception of a constant term) are removed from the undifferenced data to produce undifferenced regression residuals. The ACFs and PACFs are calculated for the specified differences of the undifferenced regression residuals. If the model does not include a regression component, then the ACFs and PACFs are calculated for the specified differences of the transformed data.
IDENTIFY Statement F 2319
Tables displayed in association with identification are “Autocorrelation of Model Residuals” and “Partial Autocorrelation of Model Residuals.” If the model includes a regression component (specified using the REGRESSION, INPUT, and EVENT statements or the MDLINFOIN= data set), then the “Regression Model Parameter Estimates” table is also displayed if the PRINTREG option is specified. The following options can appear in the IDENTIFY statement: DIFF=(order, order, order )
specifies orders of nonseasonal differencing to use in model identification. The value 0 specifies no differencing, the value 1 specifies one nonseasonal difference .1 B/, the value 2 specifies two nonseasonal differences .1 B/2 , and so forth. The ACFs and PACFs are produced for all orders of nonseasonal differencing specified, in combination with all orders of seasonal differencing that are specified in the SDIFF= option. The default is DIFF=(0). You can specify up to three values for nonseasonal differences. SDIFF=(order, order, order )
specifies orders of seasonal differencing to use in model identification. The value 0 specifies no seasonal differencing, the value 1 specifies one seasonal difference .1 B s /, the value 2 specifies two seasonal differences .1 B s /2 , and so forth. Here the value for s corresponds to the period specified in the SEASONS= option in the PROC X12 statement. The value of the SEASONS= option is supplied explicitly or is implicitly supplied through the INTERVAL= option or the values of the DATE= variable. The ACFs and PACFs are produced for all orders of seasonal differencing specified, in combination with all orders of nonseasonal differencing specified in the DIFF= option. The default is SDIFF=(0). You can specify up to three values for seasonal differences. For example, the following statement produces ACFs and PACFs for two levels of differencing: .1 B/ and .1 B/.1 B s /: identify diff=(1) sdiff=(0, 1);
MAXLAG=value
specifies the number of lags for the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) of the regression residuals for model identification. The default is 36 for monthly series and 12 for quarterly series. MAXLAG applies to both tables and plots. The minimum value for MAXLAG= is 1. PRINTREG
causes the “Regression Model Parameter Estimates” table to be printed if the REGRESSION statement is present. By default, this table is not printed.
2320 F Chapter 34: The X12 Procedure
AUTOMDL Statement AUTOMDL options ;
The AUTOMDL statement is used to invoke the automatic model selection procedure of the X-12ARIMA method. This method is based largely on the TRAMO (time series regression with ARIMA noise, missing values, and outliers) method by Gomez and Maravall (1997a, b). If the AUTOMDL statement is used without the OUTLIER statement, then only missing values regressors are included in the regARIMA model. If the AUTOMDL and the OUTLIER statements are used, then both missing values regressors and regressors for automatically identified outliers are included in the regARIMA model. For more information about missing value regressors, see the section “Missing Values” on page 2339. If both the AUTOMDL statement and the ARIMA statement are present, the ARIMA statement is ignored. The ARIMA statement specifies the model, while the AUTOMDL statement allows the X12 procedure to select the model. If the AUTOMDL statement is specified and a data set is specified in the MDLINFOIN= option of the PROC X12 statement, then the AUTOMDL statement is ignored if the specified data set contains a model specification for the series. If no model for the series is specified in the MDLINFOIN= data set, the AUTOMDL or ARIMA statement is used to determine the model. Thus, it is possible to give a specific model for some series and automatically identify the model for other series by using both the MDLINFOIN= option and the AUTOMDL statement. When AUTOMDL is specified, the X12 procedure compares a model selected using a TRAMO method to a default model. The TRAMO method is implemented first, and involves two parts: identifying the orders of differencing and identifying the ARIMA model. The table “ARIMA Estimates for Unit Root Identification” provides details about the identification of the orders of differencing, while the table “Results of Unit Root Test for Identifying Orders of Differencing” shows the orders of differencing selected by TRAMO. The table “Models Estimated by Automatic ARIMA Model Selection Procedure” provides details regarding the TRAMO automatic model selection, and the table “Best Five ARIMA Models Chosen by Automatic Modeling” ranks the best five models estimated using the TRAMO method. The “Comparison of Automatically Selected Model and Default Model” table compares the model selected by the TRAMO method to a default model. At this point in the processing, if the default model is selected over the TRAMO model, then PROC X12 displays a note. No note is displayed if the TRAMO model is selected. PROC X12 then performs checks for unit roots, over-differencing, and insignificant ARMA coefficients. If the model is changed due to any of these tests, a note is displayed. The last table, “Final Automatic Model Selection,” shows the results of the automatic model selection. The following options can appear in the AUTOMDL statement: MAXORDER=(nonseasonal order, seasonal order )
specifies the maximum orders of nonseasonal and seasonal ARMA polynomials for the automatic ARIMA model identification procedure. The maximum order for the nonseasonal ARMA parameters is 4, and the maximum order for the seasonal ARMA is 2.
AUTOMDL Statement F 2321
DIFFORDER=(nonseasonal order, seasonal order )
specifies the fixed orders of differencing to be used in the automatic ARIMA model identification procedure. When the DIFFORDER= option is used, only the AR and MA orders are automatically identified. Acceptable values for the regular differencing orders are 0, 1, and 2; acceptable values for the seasonal differencing orders are 0 and 1. If the MAXDIFF= option is also specified, then the DIFFORDER= option is ignored. There are no default values for DIFFORDER. If neither the DIFFORDER= option nor the MAXDIFF= option is specified, then the default is MAXDIFF=(2,1). MAXDIFF=(nonseasonal order, seasonal order )
specifies the maximum orders of regular and seasonal differencing for the automatic identification of differencing orders. When MAXDIFF is specified, the differencing orders are identified first, and then the AR and MA orders are identified. Acceptable values for the regular differencing orders are 1 and 2. The only acceptable value for the seasonal differencing order is 1. If both the MAXDIFF= option and the DIFFORDER option= are specified, then the DIFFORDER= option is ignored. If neither the DIFFORDER= nor the MAXDIFF= option is specified, the default is MAXDIFF=(2,1). NOINT
suppresses the fitting of a constant or intercept parameter in the model. PRINT=UNITROOTTEST PRINT=AUTOCHOICE PRINT=UNITROOTTESTMDL PRINT=AUTOCHOICEMDL PRINT=BEST5MODEL
lists the tables to be displayed in the output. PRINT=AUTOCHOICE displays the tables titled “Comparison of Automatically Selected Model and Default Model” and “Final Automatic Model Selection.” The “Comparison of Automatically Selected Model and Default Model” table compares a default model to the model chosen by the TRAMO-based automatic modeling method. The “Final Automatic Model Selection” table indicates which model has been chosen automatically. If the PRINT= option is not specified, then PRINT=AUTOCHOICE is displayed by default. PRINT=UNITROOTTEST causes the table titled “Results of Unit Root Test for Identifying Orders of Differencing” to be printed. This table displays the orders that were automatically selected by AUTOMDL. Unless the nonseasonal and seasonal differences are specified using the DIFFORDER= option, AUTOMDL automatically identifies the orders of differencing. PRINT=UNITROOTMDL displays the table titled “ARIMA Estimates for Unit Root Identification.” This table summarizes the various models that were considered by the TRAMO automatic selection method while identifying the orders of differencing and the statistics associated with those models. The unit root identification method first attempts to obtain the coefficients by using the Hannan-Rissanen method. If Hannan-Rissanen estimation cannot be performed, the algorithm attempts to obtain the coefficients by using conditional likelihood estimation.
2322 F Chapter 34: The X12 Procedure
PRINT=AUTOCHOICEMDL displays the table “Models Estimated by Automatic ARIMA Model Selection Procedure.” This table summarizes the various models that were considered by the TRAMO automatic model selection method and their measures of fit. PRINT=BEST5MODEL displays the table “Best Five ARIMA Models Chosen by Automatic Modeling.” This table ranks the five best models that were considered by the TRAMO automatic modeling method. BALANCED
specifies that the automatic modeling procedure prefer balanced models over unbalanced models. A balanced model is one in which the sum of the AR, seasonal AR, differencing, and seasonal differencing orders equals the sum of the MA and seasonal MA orders. Specifying BALANCED gives the same preference as the TRAMO program. If BALANCED is not specified, all models are given equal consideration. HRINITIAL
specifies that Hannan-Rissanen estimation be done before exact maximum likelihood estimation to provide initial values. If HRINITIAL is specified, then models for which the Hannan-Rissanen estimation has an unacceptable coefficient are rejected. ACCEPTDEFAULT
specifies that the default model be chosen if its Ljung-Box Q is acceptable. LJUNGBOXLIMIT=value
specifies acceptance criteria for confidence coefficient of the Ljung-Box Q statistic. If the Ljung-Box Q for a final model is greater than this value, the model is rejected, the outlier critical value is reduced, and outlier identification is redone with the reduced value. See the REDUCECV option for more information. The value specified in the LJUNGBOXLIMIT= option must be greater than 0 and less than 1. The default value is 0.95. REDUCECV=value
specifies the percentage that the outlier critical value be reduced when a final model is found to have an unacceptable confidence coefficient for the Ljung-Box Q statistic. This value should be between 0 and 1. The default value is 0.14286. ARMACV=value
specifies the threshold value for the t statistics that are associated with the highest-order ARMA coefficients. As a check of model parsimony, the parameter estimates and t statistics of the highest-order ARMA coefficients are examined to determine whether the coefficient is insignificant. An ARMA coefficient is considered to be insignificant if the t value that is displayed in the table “Exact ARMA Maximum Likelihood Estimation” is below the value specified in the ARMACV= option and the absolute value of the parameter estimate is reliably close to zero. The absolute value is considered to be reliably close to zero if it is below 0.15 for 150 or fewer observations or is below 0.1 for more than 150 observations. If the highest-order ARMA coefficient is found to be insignificant, then the order of the ARMA model is reduced. For example, if AUTOMDL identifies a (3 1 1)(0 0 1) model and the parameter estimate of the seasonal MA lag of order 1 is –0.09 and its t value is –0.55, then the ARIMA model is reduced to at least (3 1 1)(0 0 0). After the model is reestimated, the check for insignificant coefficients is performed again. If ARMACV=0.54 is specified in the preceding example, then the coefficient is not found to be insignificant and the model is not reduced.
OUTPUT Statement F 2323
If a constant is allowed in the model and if the t value associated with the constant parameter estimate is below the ARMACV= critical value, then the constant is considered to be insignificant and is removed from the model. Note that if a constant is added to or removed from the model and then the ARIMA model changes, then the t statistic for the constant parameter estimate also changes. Thus, changing the ARMACV= value does not necessarily add or remove a constant term from the model. The value specified in the ARMACV= option should be greater than zero. The default value is 1.0.
OUTPUT Statement OUTPUT OUT= SAS-data-set tablename1 tablename2 . . . ;
The OUTPUT statement creates an output data set that contains specified tables. The data set is named by the OUT= option. OUT=SAS-data-set
names the data set to contain the specified tables. If the OUT= option is omitted, the data set is named using the default DATAn convention. For each table to be included in the output data set, you must specify the X12 tablename keyword. The keyword corresponds to the title label used by the Census Bureau X12-ARIMA software. Currently available tables are A1, A2, A6, A7, A8, A8AO, A8LS, A8TC, A9, A10, A19, B1, C17, C20, D1, D7, D8, D9, D10, D10B, D10D, D11, D11A, D11F, D11R, D12, D13, D16, D16B, D18, E1, E2, E3, E5, E6, E6A, E6R, E7, E8, and MV1. If no table is specified in the OUTPUT statement, Table A1 is output to the OUT= data set by default. The tablename keywords that can be used in the OUTPUT statement are listed in the section “Displayed Output/ODS Table Names/OUTPUT Tablename Keywords” on page 2342. The following is an example of a VAR statement and an OUTPUT statement: var sales costs; output out=out_x12
b1 d11;
The default variable name used in the output data set is the input variable name followed by an underscore and the corresponding table name. The variable sales_B1 contains the Table B1 values for the variable sales, the variable costs_B1 contains the Table B1 values for the variable costs, while the Table D11 values for the variable sales are contained in the variable sales_D11, and the variable costs_D11 contains the Table D11 values for the variable costs. If necessary, the variable name is shortened so that the table name can be added. If the DATE= variable is specified in the PROC X12 statement, then that variable is included in the output data set; otherwise, a variable named _DATE_ is written to the OUT= data set as the date identifier.
2324 F Chapter 34: The X12 Procedure
OUTLIER Statement OUTLIER options ;
The OUTLIER statement specifies that the X12 procedure perform automatic detection of additive point outliers, temporary change outliers, level shifts, or any combination of the three when using the specified model. After outliers are identified, the appropriate regression variables are incorporated into the model as “Automatically Identified Outliers,” and the model is reestimated. This procedure is repeated until no additional outliers are found. The OUTLIER statement also identifies potential outliers and lists them in the table “Potential Outliers” in the displayed output. Potential outliers are identified by decreasing the critical value by 0.5. In the output, the default initial critical values used for outlier detection in a given analysis are displayed in the table “Critical Values to Use in Outlier Detection.” Outliers that are detected and incorporated into the model are displayed in the output in the table “Regression Model Parameter Estimates,” where the regression variable is listed as “Automatically Identified.” The following options can appear in the OUTLIER statement: SPAN=(mmmyy ,mmmyy ) SPAN=(’yyQq’ ,’yyQq’ )
gives the dates of the first and last observations to define a subset for searching for outliers. A single date in parentheses is interpreted to be the starting date of the subset. To specify only the ending date, use SPAN=(,mmmyy ) or SPAN=(,’yyQq’ ). If the starting or ending date is omitted, then the first or last date, respectively, of the input data set or BY group is assumed. Because the dates are input as strings and the quarterly dates begin with a numeric character, the specification for a quarterly date must be enclosed in quotation marks. A four-digit year can be specified. If a two-digit year is specified, the value specified in the YEARCUTOFF= SAS system option applies. TYPE=NONE TYPE=(outlier types)
lists the outlier types to be detected by the automatic outlier identification method. TYPE=NONE turns off outlier detection. The valid outlier types are AO, LS, and TC. The default is TYPE=(AO LS). CV=value
specifies an initial critical value to use for detection of all types of outliers. The absolute value of the t statistic associated with an outlier parameter estimate is compared with the critical value to determine the significance of the outlier. If the CV= option is not specified, then the default initial critical value is computed using a formula presented by Ljung (1993), which is based on the number of observations or model span used in the analysis. Table 34.2 gives default critical values for various series lengths. Increasing the critical value decreases the sensitivity of the outlier detection routine and can reduce the number of observations treated as outliers. The automatic model identification process might lower the critical value by a certain percentage, if the automatic model identification process fails to identify an acceptable model.
OUTLIER Statement F 2325
Table 34.2
Default Critical Values for Outlier Identification
Number of Observations
Outlier Critical Value
1 2 3 4 5 6 7 8 9 10 11 12 24 36 48 72 96 120 144 168 192 216 240 264 288 312 336 360
1.96 2.24 2.44 2.62 2.74 2.84 2.92 2.99 3.04 3.09 3.13 3.16 3.42 3.55 3.63 3.73 3.80 3.85 3.89 3.92 3.95 3.97 3.99 4.01 4.03 4.04 4.05 4.07
AOCV=value
specifies a critical value to use for additive point outliers. If AOCV is specified, this value overrides any default critical value for AO outliers. See the CV= option for more details. LSCV=value
specifies a critical value to use for level shift outliers. If LSCV is specified, this value overrides any default critical value for LS outliers. See the CV= option for more details. TCCV=value
specifies a critical value to use for temporary change outliers. If TCCV is specified, this value overrides any default critical value for TC outliers. See the CV= option for more details.
2326 F Chapter 34: The X12 Procedure
REGRESSION Statement REGRESSION PREDEFINED= variables < / B=(value < F >) > ; REGRESSION USERVAR= variables < / B=(value < F >) USERTYPE=option > ;
The REGRESSION statement includes regression variables in a regARIMA model or specifies regression variables whose effects are to be removed by the IDENTIFY statement to aid in ARIMA model identification. Predefined regression variables are selected with the PREDEFINED= option. User-defined regression variables are specified with the USERVAR= option. The currently available predefined variables are listed in Table 34.3. Table A6 in the displayed output generated by the X12 procedure provides information related to trading day effects. Table A7 provides information related to holiday effects. Tables A8, A8AO, A8LS, and A8TC provide information related to outlier factors. Ramps and level shifts are combined in the A8LS table. The A8AO, A8LS and A8TC tables are available only when more than one outlier type is present in the model. Table A9 provides information about user-defined regression effects. Table A10 provides information about the user-defined seasonal component. Missing values in the span of an input series automatically create missing value regressors. See the NOTRIMMISS option of the PROC X12 statement and the section “Missing Values” on page 2339 for further details about missing values. Combining your model with additional predefined regression variables can result in a singularity problem. If a singularity occurs, then you might need to alter either the model or the choices of the predefined regressors in order to successfully perform the regression. In order to seasonally adjust a series that uses a regARIMA model, the factors derived from regression are used as multiplicative or additive factors based on the mode of seasonal decomposition. Therefore, regressors should be defined that are appropriate to the mode of the seasonal decomposition, so that meaningful combined adjustment factors can be derived and adjustment diagnostics can be generated. For example, if a regARIMA model is applied to a log-transformed series, then the regression factors are expressed as ratios, which match the form of the seasonal factors that are generated by the multiplicative or log-additive adjustment modes. Conversely, if a regARIMA model is fit to the original series, then the regression factors are measured on the same scale as the original series, which matches the scale of the seasonal factors that are generated by the additive adjustment mode. Note that the default transformation (no transformation) and the default seasonal adjustment mode (multiplicative) are in conflict. Thus when you specify the X11 statement and any of the REGRESSION, INPUT, or EVENT statements, you must also specify either a transformation by using the TRANSFORM statement or a different mode by using the MODE= option of the X11 statement in order to seasonally adjust the data that uses the regARIMA model. According to Ladiray and Quenneville (2001), “X-12-ARIMA is based on the same principle [as the X-11 method] but proposes, in addition, a complete module, called Reg-ARIMA, that allows for the initial series to be corrected for all sorts of undesirable effects. These effects are estimated using regression models with ARIMA errors (Findley et al. [23]).” The REGRESSION, INPUT, and EVENT statements specify these regression effects. Predefined effects that can be corrected in this manner are listed in the PREDEFINED= option. You can create your own definitions to remove other effects by using the USERVAR= option and the EVENT statement. Either the PREDEFINED= option or the USERVAR= option can be specified in a single REGRESSION statement, but not both. Multiple REGRESSION statements can be used.
REGRESSION Statement F 2327
The following options can appear in the REGRESSION statement. PREDEFINED=CONSTANT PREDEFINED=EASTER(value) PREDEFINED=LABOR(value) PREDEFINED=LOM PREDEFINED=LOMSTOCK PREDEFINED=LOQ PREDEFINED=LPYEAR PREDEFINED=SCEASTER(value) PREDEFINED=SEASONAL PREDEFINED=SINCOS(value . . . ) PREDEFINED=TD PREDEFINED=TD1COEF PREDEFINED=TD1NOLPYEAR PREDEFINED=TDNOLPYEAR PREDEFINED=TDSTOCK(value) PREDEFINED=THANK(value)
lists the predefined regression variables to be included in the model. Data values for these variables are calculated by the program, mostly as functions of the calendar. Table 34.3 gives definitions for the available predefined variables. The values LOM and LOQ are equivalent: the actual regression is controlled by the PROC X12 SEASONS= option. Multiple predefined regression variables can be used. The syntax for using both a length-of-month and a seasonal regression can be in one of the following forms: regression predefined=lom seasonal; regression predefined=(lom seasonal); regression predefined=lom predefined=seasonal;
Certain restrictions apply when you use more than one predefined regression variable. Only one of TD, TDNOLPYEAR, TD1COEF, or TD1NOLPYEAR can be specified. LPYEAR cannot be used with TD, TD1COEF, LOM, LOMSTOCK, or LOQ. LOM or LOQ cannot be used with TD or TD1COEF. The following restriction also applies to the SINCOS predefined regression variable. If SINCOS is specified, then the INTERVAL= option or the SEASONS= option must also be specified because there are restrictions to this regression variable based on the frequency of the data.
2328 F Chapter 34: The X12 Procedure
The predefined regression variables TDSTOCK, SCEASTER, EASTER, LABOR, THANK, and SINCOS require extra parameters. Only one TDSTOCK regressor can be implemented in the regression model. If multiple TDSTOCK variables are specified, PROC X12 uses the last TDSTOCK variable specified. For SCEASTER, EASTER, LABOR, THANK, and SINCOS, multiple regressors can be implemented in the model by specifying the variables with different parameters. For example, the following statement specifies two EASTER regressors with widths 7 and 14: regression predefined=easter(7) easter(14);
For SINCOS, specifying a parameter includes both the sine and the cosine regressor except for the highest order allowed (2 for quarterly data and 6 for monthly data.) The most common use of the SINCOS variable for quarterly data is regression predefined=sincos(1,2);
and for monthly data is regression predefined=sincos(1,2,3,4,5,6);
These statements include 3 and 11 regressors in the model, respectively. Table 34.3
Predefined Regression Variables in X-12-ARIMA
Regression Effect
Variable Definitions .1
B/
d .1
B s( /
D I.t
1/; for t 1 for t < 1
Trend constant CONSTANT
1 where I.t 1/ D 0
Easter holiday EASTER(w)
E.w; t/ D w1 nt and nt is the number of the w days before Easter that fall in month (or quarter) t. (Note: This variable is 0 except in February, March, and April (or first and second quarter). It is nonzero in February only for w > 22.) Restriction: 1 w 25.
Labor Day LABOR(w)
Length-of-month (monthly flow) LOM
L.w; t/ D w1 Œno. of the w days before Labor Day that fall in month t (Note: This variable is 0 except in August and September.) Restriction: 1 w 25. mt m N where mt = length of month t (in days) and m N D 30:4375 (average length of month)
REGRESSION Statement F 2329
Table 34.3
continued
Regression Effect
Variable Definitions (
mt m N .l/ for t D 1 SLOMt 1 C mt m N otherwise where m N8and mt are defined in LOM and ˆ 0:375 when first February in series is a leap year ˆ ˆ ˆ |t| is greater than 0.05. This suggests that the Airline Model should be discarded in favor of the more parsimonious differencing model, which has no parameters to estimate.
The Model Selection Criterion Return to the Develop Models window (Figure 42.9) and notice the Root Mean Square Error button at the right side of the table banner. This is the model selection criterion—the statistic used by the system to select the best fitting model. So far in this example you have fit two models and have left the default criterion, root mean square error (RMSE), in effect. Because the Airline Model has the smaller value of this criterion and because smaller values of the RMSE indicate better fit, the system has chosen this model as the forecasting model, indicated by the check box in the Forecast Model column.
The Model Selection Criterion F 2731
The statistics available as model selection criteria are a subset of the statistics available for informational purposes. To access the entire set, select Options from the menu bar, and then select Statistics of Fit. The Statistics of Fit Selection window appears, as shown in Figure 42.12. Figure 42.12 Statistics of Fit
By default, five of the more well known statistics are selected. You can select and deselect statistics by clicking the check boxes in the left column. For this exercise, select All, and notice that all the check boxes become checked. Select the OK button to close the window. Now if you choose Statistics of Fit in the Model Viewer window, all of the statistics will be shown for the selected model. To change the model selection criterion, click the Root Mean Square Error button or select Options from the menu bar and then select Model Selection Criterion. Notice that most of the statistics of fit are shown, but those which are not relevant to model selection, such as number of observations, are not shown. Select Schwarz Bayesian Information Criterion and click OK. Since this statistic puts a high penalty on models with larger numbers of parameters, the ARIMA(0,1,0)(0,1,0)s model comes out with the better fit. Notice that changing the selection criterion does not automatically select the model that is best according to that criterion. You can always choose the model you want to use for forecasts by selecting its check box in the Forecast Model column.
2732 F Chapter 42: Choosing the Best Forecasting Model
Now bring up the Model Selection Criterion window again and select Akaike Information Criterion. This statistic puts a lesser penalty on number of parameters, and the Airline Model comes out as the better fitting model.
Sorting and Selecting Models Select Sort Models on the Tools menu or from the toolbar. This sorts the current list of fitted models by the current selection criterion. Although some selection criteria assign larger values to better fitting models (for example, R-square) while others assign smaller values to better fitting models, Sort Models always orders models with the best fitting model—in this case, the Airline Model—at the top of the list. When you select a model in the table, its name and criterion value become highlighted, and actions that apply to that model become available. If your system supports a right mouse button, you can click it to invoke a pop-up menu, as shown in Figure 42.13. Figure 42.13 Right Mouse Button Pop-up Menu
Comparing Models F 2733
Whether or not you have a right mouse button, the same choices are available under Edit and View from the menu bar. If the model viewer has been invoked, it is automatically updated to show the selected model, unless you have unlinked the viewer by using the Link/Unlink toolbar button. Select the highlighted model in the table again. Notice that it is no longer highlighted. When no models are highlighted, the right mouse button pop-up menu changes, and items on the menu bar that apply to a selected model become unavailable. For example, you can choose Edit from the menu bar, but you can’t choose the Edit Model or Delete Model selections unless you have highlighted a model in the table. When you select the check box in the Forecast Model column of the table, the model in that row becomes the forecasting model. This is the model that will be used the next time forecasts are generated by choosing View Forecasts or by using the Produce Forecasts window. Note that this forecasting model flag is automatically set when you use Fit Automatic Model or when you fit an individual model that fits better, using the current selection criterion, than the current forecasting model.
Comparing Models Select Tools and Compare Models from the menu bar. This displays the Model Fit Comparison table, as shown in Figure 42.14.
2734 F Chapter 42: Choosing the Best Forecasting Model
Figure 42.14 Model Comparison Window
The two models you have fit are shown as Model 1 and Model 2. When there are more than two models, you can bring any two of them into the table by selecting the up and down arrows. In this way, it is easy to do pairwise comparisons on any number of models, looking at as many statistics of fit as you like. Since you previously chose to display all statistics of fit, all of them are shown in the comparison table. Use the vertical scroll bar to move through the list. After you have examined the model comparison table, select the Close button to return to the Develop Models window.
Controlling the Period of Evaluation and Fit Notice the three time ranges shown on the Develop Models window (Figure 42.9). The data range shows the beginning and ending dates of the MASONRY time series. The period of fit shows the beginning and ending dates of data used to fit the models. The period of evaluation shows the beginning and ending dates of data used to compute statistics of fit. By default, the fit and evaluate ranges are the same as the data range. To change these ranges, select the Set Ranges
Controlling the Period of Evaluation and Fit F 2735
button, or select Options and Time Ranges from the menu bar. This brings up the Time Ranges Specification window, as shown in Figure 42.15. Figure 42.15 Time Ranges Specification Window
For this example, suppose the early data in the series is unreliable, and you want to use the range June 1978 to the latest available for both model fitting and model evaluation. You can either type JUN1978 in the From column for Period of Fit and Period of Evaluation, or you can advance these dates by clicking the right pointing arrows. The outer arrow advances the date by a large amount (in this case, by a year), and the inner arrow advances it by a single period (in this case, by a month). Once you have changed the Period of Fit and the Period of Evaluation to JUN1978 in the From column, select the OK button to return to the Develop Models window. Notice that these time ranges are updated at the top of the window, but the models already fit have not been affected. Your changes to the time ranges affect subsequently fit models.
2736 F Chapter 42: Choosing the Best Forecasting Model
Refitting and Reevaluating Models If you fit the ARIMA(0,1,0)(0,1,0)s and Airline models again in the same way as before, they will be added to the model list, with the same names but with different values of the model selection criterion. Parameter estimates will be different, due to the new fit range, and statistics of fit will be different, due to the new evaluation range. For this exercise, instead of specifying the models again, refit the existing models by selecting Edit from the menu bar and then selecting Refit Models and All Models. After the models have been refit, you should see the same two models listed in the table but with slightly different values for the selection criterion. The ARIMA (0,1,0)(0,1,0)s and Airline models have now been fit to the MASONRY series by using data from June 1978 to July 1982, since this is the period of fit you specified. The statistics of fit have been computed for the period of evaluation, which was the same as the period of fit. If you had specified a period of evaluation different from the period of fit, the statistics would have been computed accordingly. In practice, another common reason for refitting models is the availability of new data. For example, when data for a new month become available for a monthly series, you might add them to the input data set, then invoke the forecasting system, open the project containing models fit previously, and refit the models prior to generating new forecasts. Unless you specify the period of fit and period of evaluation in the Time Ranges Specification window, they default to the full data range of the series found in the input data set at the time of refitting. If you prefer to apply previously fit models to revised data without refitting, use Reevaluate Models instead of Refit Models. This recomputes the statistics of fit by using the current evaluation range, but does not re-estimate the model parameters.
Using Hold-out Samples One important application of model fitting where the period of fit is different from the period of evaluation is the use of hold-out samples. With this technique of model evaluation, the period of fit ends at a time point before the end of the data series, and the remainder of the data are held out as a nonoverlapping period of evaluation. With respect to the period of fit, the hold-out sample is a period in the future, used to compare the forecasting accuracy of models fit to past data. For this exercise, use a hold-out sample of 12 months. Bring up the Time Ranges Specification window again by selecting the Set Ranges button. Set Hold-out Sample to 12 using the combo box, as shown in Figure 42.16. You can also type in a value. To specify a hold-out sample period in different units, you can use the Periods combo box. In this case, it allows you to select years as the unit, instead of periods.
Using Hold-out Samples F 2737
Figure 42.16 Specifying the Hold-out Sample Size
Notice that setting the hold-out sample to 12 automatically sets the fit range to JUN1978–JUL1981 and the evaluation range to AUG1981–JUL1982. If you had set the period of fit and period of evaluation to these ranges, the hold-out sample would have been automatically set to 12 periods. Select the OK button to return to the Develop Models window. Now refit the models again. Select Tools and Compare Models to compare the models now that they have been fit to the period June 1978 through July 1981 and evaluated for the hold-out sample period August 1981 through July 1982. Note that the fit statistics for the hold-out sample are based on one-step-ahead forecasts. (See Statistics of Fit in Chapter 46, “Forecasting Process Details.”) As shown in Figure 42.17, the ARIMA (0,1,0)(0,1,0)s model now seems to provide a better fit to the data than does the Airline model. It should be noted that the results can be quite different if you choose a different size hold-out sample.
2738 F Chapter 42: Choosing the Best Forecasting Model
Figure 42.17 Using 12 Month Hold-out Sample
Chapter 43
Using Predictor Variables Contents Linear Trend . . . . . . . . . . . . . . . . . . Time Trend Curves . . . . . . . . . . . . . . . Regressors . . . . . . . . . . . . . . . . . . . Adjustments . . . . . . . . . . . . . . . . . . . Dynamic Regressor . . . . . . . . . . . . . . . Interventions . . . . . . . . . . . . . . . . . . The Intervention Specification Window . Specifying a Trend Change Intervention . Specifying a Level Change Intervention . Modeling Complex Intervention Effects . Fitting the Intervention Model . . . . . . Limitations of Intervention Predictors . . Seasonal Dummies . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. .
. . . .
2742 2743 2747 2750 2751 2755 2756 2758 2760 2761 2763 2767 2767 2771
Forecasting models predict the future values of a series by using two sources of information: the past values of the series and the values of other time series variables. Other variables used to predict a series are called predictor variables. Predictor variables that are used to predict the dependent series can be variables in the input data set, such as regressors and adjustment variables, or they can be special variables computed by the system as functions of time, such as trend curves, intervention variables, and seasonal dummies. You can specify seven different types of predictors in forecasting models by using the ARIMA Model or Custom Model Specification windows. You cannot specify predictor variables with the Smoothing Model Specification window. Figure 43.1 shows the menu of options for adding predictors to an ARIMA model that is opened by clicking the Add button. The Add menu for the Custom Model Specification menu is similar.
2740 F Chapter 43: Using Predictor Variables
Figure 43.1 Add Predictors Menu
These types of predictors are as follows. Linear Trend
adds a variable that indexes time as a predictor series. A straight line time trend is fit to the series by regression when you specify a linear trend.
Trend Curve
provides a menu of various functions of time that you can add to the model to fit nonlinear time trends. The Linear Trend option is a special case of the Trend Curve option for which the trend curve is a straight line.
Regressors
allows you to predict the series by regressing it on other variables in the data set.
Adjustments
allows you to specify other variables in the data set that supply adjustments to the forecast.
Dynamic Regressor
allows you to select a predictor variable from the input data set and specify a complex model for the way that the predictor variable affects the dependent series.
Interventions
allows you to model the effect of special events that “intervene” to change the pattern of the dependent series. Examples of intervention effects are strikes, tax increases, and special sales promotions.
Using Predictor Variables F 2741
Seasonal Dummies
adds seasonal indicator or “dummy” variables as regressors to model seasonal effects.
You can add any number of predictors to a forecasting model, and you can combine predictor variables with other model options. The following sections explain these seven kinds of predictors in greater detail and provide examples of their use. The examples illustrate these different kinds of predictors by using series in the SASHELP.USECON data set. Select the Develop Models button from the main window. Select the data set SASHELP.USECON and select the series PETROL. Then select the View Series Graphically button from the Develop Models window. The plot of the example series PETROL appears as shown in Figure 43.2. Figure 43.2 Sales of Petroleum and Coal
2742 F Chapter 43: Using Predictor Variables
Linear Trend From the Develop Models window, select Fit ARIMA Model. From the ARIMA Model Specification window, select Add and then select Linear Trend from the menu (shown in Figure 43.1). A linear trend is added to the Predictors list, as shown in Figure 43.3. Figure 43.3 Linear Trend Predictor Specified
The description for the linear trend item shown in the Predictors list has the following meaning. The first part of the description, Trend Curve, describes the type of predictor. The second part, _LINEAR_, gives the variable name of the predictor series. In this case, the variable is a time index that the system computes. This variable is included in the output forecast data set. The final part, Linear Trend, describes the predictor. Notice that the model you have specified consists only of the time index regressor _LINEAR_ and an intercept. Although this window is normally used to specify ARIMA models, in this case no ARIMA model options are specified, and the model is a simple regression on time. Select the OK button. The Linear Trend model is fit and added to the model list in the Develop Models window.
Time Trend Curves F 2743
Now open the Model Viewer by using the View Model Graphically icon or the Model Predictions item under the View pull-down menu or toolbar. This displays a plot of the model predictions and actual series values, as shown in Figure 43.4. The predicted values lie along the least squares trend line. Figure 43.4 Linear Trend Model
Time Trend Curves From the Develop Models window, select Fit ARIMA Model. From the ARIMA Model Specification window, select Add and then select Trend Curve from the menu (shown in Figure 43.1). A menu of different kinds of trend curves is displayed, as shown in Figure 43.5.
2744 F Chapter 43: Using Predictor Variables
Figure 43.5 Time Trend Curves Menu
These trend curves work in a similar way as the Linear Trend option (which is a special case of a trend curve and one of the choices on the menu), but with the Trend Curve menu you have a choice of various nonlinear time trends. Select Quadratic Trend. This adds a quadratic time trend to the Predictors list, as shown in Figure 43.6.
Time Trend Curves F 2745
Figure 43.6 Quadratic Trend Specified
Now select the OK button. The quadratic trend model is fit and added to the list of models in the Develop Models window. The Model Viewer displays a plot of the quadratic trend model, as shown in Figure 43.7.
2746 F Chapter 43: Using Predictor Variables
Figure 43.7 Quadratic Trend Model
This curve does not fit the PETROL series very well, but the View Model plot illustrates how time trend models work. You might want to experiment with different trend models to see what the different trend curves look like. Some of the trend curves require transforming the dependent series. When you specify one of these curves, a notice is displayed reminding you that a transformation is needed, and the Transformation field is automatically filled in. Therefore, you cannot control the Transformation specification when some kinds of trend curves are specified. See the section “Time Trend Curves” on page 2743 in Chapter 46, “Forecasting Process Details,” for more information about the different trend curves.
Regressors F 2747
Regressors From the Develop Models window, select Fit ARIMA Model. From the ARIMA Model Specification window, select Add and then select Regressors from the menu (shown in Figure 43.1). This displays the Regressors Selection window, as shown in Figure 43.8. This window allows you to select any number of other series in the input data set as regressors to predict the dependent series. Figure 43.8 Regressors Selection Window
For this example, select CHEMICAL, Sales:
Chemicals and Allied Products, and VEHICLES, Sales: Motor Vehicles and Parts. (Note: You do not need to use the CTRL key when selecting more than one regressor.) Then select the OK button. The two variables you
selected are added to the Predictors list as regressor type predictors, as shown in Figure 43.9.
2748 F Chapter 43: Using Predictor Variables
Figure 43.9 Regressors Selected
You must have forecasts of the future values of the regressor variables in order to use them as predictors. To do this, you can specify a forecasting model for each regressor, have the system automatically select forecasting models for the regressors, or supply predicted future values for the regressors in the input data set. Even if you have supplied future values for a regressor variable, the system requires a forecasting model for the regressor. Future values that you supply in the input data set take precedence over predicted values from the regressor’s forecasting model when the system computes the forecasts for the dependent series. Select the OK button. The system starts to fit the regression model but then stops and displays a warning that the regressors that you selected do not have forecasting models, as shown in Figure 43.10.
Regressors F 2749
Figure 43.10 Regressors Needing Models Warning
If you want the system to create forecasting models automatically for the regressor variables by using the automatic model selection process, select the OK button. If not, you can select the Cancel button to abort fitting the regression model. For this example, select the OK button. The system now performs the automatic model selection process for CHEMICAL and VEHICLES. The selected forecasting models for CHEMICAL and VEHICLES are added to the model lists for those series. If you switch the current time series in the Develop Models window to CHEMICAL or VEHICLES, you will see the model that the system selected for that series. Once forecasting models have been fit for all regressors, the system proceeds to fit the regression model for PETROL. The fitted regression model is added to the model list displayed in the Develop Models window.
2750 F Chapter 43: Using Predictor Variables
Adjustments An adjustment predictor is a variable in the input data set that is used to adjust the forecast values produced by the forecasting model. Unlike a regressor, an adjustment variable does not have a regression coefficient. No model fitting is performed for adjustments. Nonmissing values of the adjustment series are simply added to the model prediction for the corresponding period. Missing adjustment values are ignored. If you supply adjustment values for observations within the period of fit, the adjustment values are subtracted from the actual values, and the model is fit to these adjusted values. To add adjustments, select Add and then select Adjustments from the pop-up menu (shown in Figure 43.1). This displays the Adjustments Selection window. The Adjustments Selection window functions the same as the Regressor Selection window (which is shown in Figure 43.8). You can select any number of adjustment variables as predictors. Unlike regressors, adjustments do not require forecasting models for the adjustment variables. If a variable that is used as an adjustment does have a forecasting model fit to it, the adjustment variable’s forecasting model is ignored when the variable is used as an adjustment. You can use forecast adjustments to account for expected future events that have no precedent in the past and so cannot be modeled by regression. For example, suppose you are trying to forecast the sales of a product, and you know that a special promotional campaign for the product is planned during part of the period you want to forecast. If such sales promotion programs have been frequent in the past, then you can record the past and expected future level of promotional efforts in a variable in the data set and use that variable as a regressor in the forecasting model. However, if this is the first sales promotion of its kind for this product, you have no way to estimate the effect of the promotion from past data. In this case, the best you can do is to make an educated guess at the effect the promotion will have and add that guess to what your forecasting model would predict in the absence of the special sales campaign. Adjustments are also useful for making judgmental alterations to forecasts. For example, suppose you have produced forecast sales data for the next 12 months. Your supervisor believes that the forecasts are too optimistic near the end and asks you to prepare a forecast graph in which the numbers that you have forecast are reduced by 1000 in the last three months. You can accomplish this task by editing the input data set so that it contains observations for the actual data range of sales plus 12 additional observations for the forecast period, and a new variable called, for example, ADJUSTMENT. The variable ADJUSTMENT contains the value 1000 for the last three observations and is missing for all other observations. You fit the same model previously selected for forecasting by using the ARIMA Model Specification or Custom Model Specification window, but with an adjustment added that uses the variable ADJUSTMENT. Now when you graph the forecasts by using the Model Viewer, the last three periods of the forecast are reduced by 1000. The confidence limits are unchanged, which helps draw attention to the fact that the adjustments to the forecast deviate from what would be expected statistically.
Dynamic Regressor F 2751
Dynamic Regressor Selecting Dynamic Regressor from the Add Predictors menu (shown in Figure 43.1) allows you to specify a complex time series model of the way that a predictor variable influences the series that you are forecasting. When you specify a predictor variable as a simple regressor, only the current period value of the predictor effects the forecast for the period. By specifying the predictor with the Dynamic Regression option, you can use past values of the predictor series, and you can model effects that take place gradually. Dynamic regression models are an advanced feature that you are unlikely to find useful unless you have studied the theory of statistical time series analysis. You might want to skip this section if you are not trained in time series modeling. The term dynamic regression was introduced by Pankratz (1991) and refers to what Box and Jenkins (1976) named transfer function models. In dynamic regression, you have a time series model, similar to an ARIMA model, that predicts how changes in the predictor series affect the dependent series over time. The dynamic regression model relates the predictor variable to the expected value of the dependent series in the same way that an ARIMA model relates the fluctuations of the dependent series about its conditional mean to the random error term (which is also called the innovation series). Refer to Pankratz (1991) and Box and Jenkins (1976) for more information about dynamic regression or transfer function models. See also Chapter 7, “The ARIMA Procedure.” From the Develop Models window, select Fit ARIMA Model. From the ARIMA Model Specification window, select Add and then select Linear Trend from the menu (shown in Figure 43.1). Now select Add and select Dynamic Regressor. Selection window, as shown in Figure 43.11.
This displays the Dynamic Regressors
2752 F Chapter 43: Using Predictor Variables
Figure 43.11 Dynamic Regressors Selection Window
You can select only one predictor series when specifying a dynamic regression model. For this example, select VEHICLES, Sales: Motor Vehicles and Parts. Then select the OK button. This displays the Dynamic Regression Specification window, as shown in Figure 43.12.
Dynamic Regressor F 2753
Figure 43.12 Dynamic Regression Specification Window
This window consists of four parts. The Input Transformations fields enable you to transform or lag the predictor variable. For example, you might use the lagged logarithm of the variable as the predictor series. The Order of Differencing fields enable you to specify simple and seasonal differencing of the predictor series. For example, you might use changes in the predictor variable instead of the variable itself as the predictor series. The Numerator Factors and Denominator Factors fields enable you to specify the orders of simple and seasonal numerator and denominator factors of the transfer function. Simple regression is a special case of dynamic regression in which the dynamic regression model consists of only a single regression coefficient for the current value of the predictor series. If you select the OK button without specifying any options in the Dynamic Regression Specification window, a simple regressor will be added to the model. For this example, use the Simple Order combo box for Denominator Factors and set its value to 1. The window now appears as shown in Figure 43.13.
2754 F Chapter 43: Using Predictor Variables
Figure 43.13 Distributed Lag Regression Specified
This model is equivalent to regression on an exponentially weighted infinite distributed lag of VEHICLES (in the same way an MA(1) model is equivalent to single exponential smoothing). Select the OK button to add the dynamic regressor to the model predictors list. In the ARIMA Model Specification window, the Predictors list should now contain two items, a linear trend and a dynamic regressor for VEHICLES, as shown in Figure 43.14.
Interventions F 2755
Figure 43.14 Dynamic Regression Model
This model is a multiple regression of PETROL on a time trend variable and an infinite distributed lag of VEHICLES. Select the OK button to fit the model. As with simple regressors, if VEHICLES does not already have a forecasting model, an automatic model selection process is performed to find a forecasting model for VEHICLES before the dynamic regression model for PETROL is fit.
Interventions An intervention is a special indicator variable, computed automatically by the system, that identifies time periods affected by unusual events that influence or intervene in the normal path of the time series you are forecasting. When you add an intervention predictor, the indicator variable of the intervention is used as a regressor, and the impact of the intervention event is estimated by regression analysis. To add an intervention to the Predictors list, you must use the Intervention Specification window to specify the time or times that the intervening event took place and to specify the type of intervention.
2756 F Chapter 43: Using Predictor Variables
You can add interventions either through the Interventions item of the Add action or by selecting Tools from the menu bar and then selecting Define Interventions. Intervention specifications are associated with the series. You can specify any number of interventions for each series, and once you define interventions you can select them for inclusion in forecasting models. If you select the Include Interventions option in the Options menu, any interventions that you have previously specified for a series are automatically added as predictors to forecasting models for the series. From the Develop Models window, invoke the series viewer by selecting the View Series Graphically icon or Series under the View menu. This displays the Time Series Viewer, as was shown in Figure 43.2. Note that the trend in the PETROL series shows several clear changes in direction. The upward trend in the first part of the series reverses in 1981. There is a sharp drop in the series towards the end of 1985, after which the trend is again upwardly sloped. Finally, in 1991 the series takes a sharp upward excursion but quickly returns to the trend line. You might have no idea what events caused these changes in the trend of the series, but you can use these patterns to illustrate the use of intervention predictors. To do this, you fit a linear trend model to the series, but modify that trend line by adding intervention effects to model the changes in trend you observe in the series plot.
The Intervention Specification Window From the Develop Models window, select Fit ARIMA model. From the ARIMA Model Specification window, select Add and then select Linear Trend from the menu (shown in Figure 43.1). Select Add again and then select Interventions. If you have any interventions already defined for the series, this selection displays the Interventions for Series window. However, since you have not previously defined any interventions, this list is empty. Therefore, the system assumes that you want to add an intervention and displays the Intervention Specification window instead, as shown in Figure 43.15.
The Intervention Specification Window F 2757
Figure 43.15 Interventions Specification Window
The top of the Intervention Specification window shows the current series and the label for the new intervention (initially blank). At the right side of the window is a scrollable table showing the values of the series. This table helps you locate the dates of the events you want to model. At the left of the window is an area titled Intervention Specification that contains the options for defining the intervention predictor. The Date field specifies the time that the intervention occurs. You can type a date value in the Date field, or you can set the Date value by selecting a row from the table of series values at the right side of the window. The area titled Type of Intervention controls the kind of indicator variable constructed to model the intervention effect. You can specify the following kinds of interventions: Point
is used to indicate an event that occurs in a single time period. An example of a point event is a strike that shuts down production for part of a time period. The value of the intervention’s indicator variable is zero except for the date specified.
Step
is used to indicate a continuing event that changes the level of the series. An example of a step event is a change in the law, such as a tax rate increase. The value of the intervention’s indicator variable is zero before the date specified and 1 thereafter.
2758 F Chapter 43: Using Predictor Variables
Ramp
is used to indicate a continuing event that changes the trend of the series. The value of the intervention’s indicator variable is zero before the date specified, and it increases linearly with time thereafter.
The areas titled Effect Time Window and Effect Decay Pattern specify how to model the effect that the intervention has on the dependent series. These options are not used for simple interventions, they will be discussed later in this chapter.
Specifying a Trend Change Intervention In the Time Series Viewer window position the mouse over the highest point in 1981 and select the point. This displays the data value, 19425, and date, February 1981, of that point in the upper-right corner of the Time Series Viewer, as shown in Figure 43.16. Figure 43.16 Identifying the Turning Point
Now that you know the date that the trend reversal occurred, enter that date in the Date field of the Intervention Specification window. Select Ramp as the type of intervention. The window should now appear as shown in Figure 43.17.
Specifying a Trend Change Intervention F 2759
Figure 43.17 Ramp Intervention Specified
Select the OK button. This adds the intervention to the list of interventions for the PETROL series, and returns you to the Interventions for Series window, as shown in Figure 43.18.
2760 F Chapter 43: Using Predictor Variables
Figure 43.18 Interventions for Series Window
This window allows you to select interventions for inclusion in the forecasting model. Since you need to define other interventions, select the Add button. This returns you to the Intervention Specification window (shown in Figure 43.15).
Specifying a Level Change Intervention Now add an intervention to account for the drop in the series in late 1985. You can locate the date of this event by selecting points in the Time Series Viewer plot or by scrolling through the data values table in the Interventions Specification window. Use the latter method so that you can see how this works. Scrolling through the table, you see that the drop was from 15262 in December 1985, to 13937 in January 1986, to 12002 in February, to 10834 in March. Since the drop took place over several periods, you could use another ramp type intervention. However, this example represents the drop as a sudden event by using a step intervention and uses February 1986 as the approximate time of the drop.
Modeling Complex Intervention Effects F 2761
Select the table row for February 1986 to set the Date field. Select Step as the intervention type. The window should now appear as shown in Figure 43.19. Figure 43.19 Step Intervention Specified
Select the OK button to add this intervention to the list for the series. Since the trend reverses again after the drop, add a ramp intervention for the same date as the step intervention. Select Add from the Interventions for Series window. Enter FEB86 in the Date field, select Ramp, and then select the OK button.
Modeling Complex Intervention Effects You have now defined three interventions to model the changes in trend and level. The excursion near the end of the series remains to be dealt with. Select Add from the Interventions for Series window. Scroll through the data values and select the date on which the excursion began, August 1990. Leave the intervention type as Point.
2762 F Chapter 43: Using Predictor Variables
The pattern of the series from August 1990 through January 1991 is more complex than a simple shift in level or trend. For this pattern, you need a complex intervention model for an event that causes a sharp rise followed by a rapid return to the previous trend line. To specify this model, use the Effect Time Window and Effect Decay Rate options. The Effect Time Window option controls the number of lags of the intervention’s indicator variable used to model the effect of the intervention on the dependent series. For a simple intervention, the number of lags is zero, which means that the effect of the intervention is modeled by fitting a single regression coefficient to the intervention’s indicator variable. When you set the number of lags greater than zero, regression coefficients are fit to lags of the indicator variable. This allows you to model interventions whose effects take place gradually, or to model rebound effects. For example, severe weather might reduce production during one period but cause an increase in production in the following period as producers struggle to catch up. You could model this by using a point intervention with an effect time window of 1 lag. This would fit two coefficients for the intervention, one for the immediate effect and one for the delayed effect. The Effect Decay Pattern option controls how the effect of the intervention dissipates over time. None specifies that there is no gradual decay: for point interventions, the effect ends immediately; for step and ramp interventions, the effect continues indefinitely. Exp specifies that the effect declines at an exponential rate. Wave specifies that the effect declines like an exponentially damped sine wave (or as the sum of two exponentials, depending on the fit to the data). If you are familiar with time series analysis, these options might be clearer if you note that together the Effect Time Window and Effect Decay Pattern options define the numerator and denominator orders of a transfer function or dynamic regression model for the indicator variable of the intervention. See the section “Dynamic Regressor” on page 2751 for more information. For this example, select 2 lags as the value of the Event Time Window option, and select Exp as the Effect Decay Pattern option. The window should now appear as shown in Figure 43.20.
Fitting the Intervention Model F 2763
Figure 43.20 Complex Intervention Model
Select the OK button to add the intervention to the list.
Fitting the Intervention Model The Interventions for Series window now contains definitions for four intervention predictors. Select all four interventions, as shown in Figure 43.21.
2764 F Chapter 43: Using Predictor Variables
Figure 43.21 Interventions for Series Window
Select the OK button. This returns you to the ARIMA Model Specification window, which now lists items in the Predictors list, as shown in Figure 43.22.
Fitting the Intervention Model F 2765
Figure 43.22 Linear Trend with Interventions Specified
Select the OK button to fit this model. After the model is fit, bring up the Model Viewer. You will see a plot of the model predictions, as shown in Figure 43.23.
2766 F Chapter 43: Using Predictor Variables
Figure 43.23 Linear Trend with Interventions Model
You can use the Zoom In feature to take a closer look at how the complex intervention effect fits the excursion in the series starting in August 1990.
Limitations of Intervention Predictors F 2767
Limitations of Intervention Predictors Note that the model you have just fit is intended only to illustrate the specification of interventions. It is not intended as an example of good forecasting practice. The use of continuing (step and ramp type) interventions as predictors has some limitations that you should consider. If you model a change in trend with a simple ramp intervention, then the trend in the data before the date of the intervention has no influence on the forecasts. Likewise, when you use a step intervention, the average level of the series before the intervention has no influence on the forecasts. Only the final trend and level at the end of the series are extrapolated into the forecast period. If a linear trend is the only pattern of interest, then instead of specifying step or ramp interventions, it would be simpler to adjust the period of fit so that the model ignores the data before the final trend or level change. Step and ramp interventions are valuable when there are other patterns in the data—such as seasonality, autocorrelated errors, and error variance—that are stable across the changes in level or trend. Step and ramp interventions enable you to fit seasonal and error autocorrelation patterns to the whole series while fitting the trend only to the latter part of the series. Point interventions are a useful tool for dealing with outliers in the data. A point intervention will fit the series value at the specified date exactly, and it has the effect of removing that point from the analysis. When you specify an effect time window, a point intervention will exactly fit as many additional points as the number of lags specified.
Seasonal Dummies A Seasonal Dummies predictor is a special feature that adds to the model seasonal indicator or “dummy” variables to serve as regressors for seasonal effects. From the Develop Models window, select Fit ARIMA Model. From the ARIMA Model Specification window, select Add and then select Seasonal Dummies from the menu (shown in Figure 43.1). A Seasonal Dummies input is added to the Predictors list, as shown in Figure 43.24.
2768 F Chapter 43: Using Predictor Variables
Figure 43.24 Seasonal Dummies Specified
Select the OK button. A model consisting of an intercept and 11 seasonal dummy variables is fit and added to the model list in the Develop Models window. This is effectively a mean model with a separate mean for each month. Now return to the Model Viewer, which displays a plot of the model predictions and actual series values, as shown in Figure 43.25. This is obviously a poor model for this series, but it serves to illustrate how seasonal dummy variables work.
Seasonal Dummies F 2769
Figure 43.25 Seasonal Dummies Model
Now select the parameter estimates icon, the fifth from the top on the vertical toolbar. This displays the Parameter Estimates table, as shown in Figure 43.26.
2770 F Chapter 43: Using Predictor Variables
Figure 43.26 Parameter Estimates for Seasonal Dummies Model
Since the data for this example are monthly, the Seasonal Dummies option added 11 seasonal dummy variables. These include a dummy regressor variable that is 1.0 for January and 0 for other months, a regressor that is 1.0 only for February, and so forth through November. Because the model includes an intercept, no dummy variable is added for December. The December effect is measured by the intercept, while the effect of other seasons is measured by the difference between the intercept and the estimated regression coefficient for the season’s dummy variable. The same principle applies for other data frequencies: the “Seasonal Dummy 1” parameter always refers to the first period in the seasonal cycle; and, when an intercept is present in the model, there is no seasonal dummy parameter for the last period in the seasonal cycle.
References F 2771
References Box, G.E.P. and Jenkins, G.M. (1976), Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day. Pankratz, Alan (1991), Forecasting with Dynamic Regression Models, New York: John Wiley & Sons.
2772
Chapter 44
Command Reference Contents TSVIEW Command and Macro . . Syntax . . . . . . . . . . . . Examples . . . . . . . . . . . FORECAST Command and Macro . Syntax . . . . . . . . . . . . Examples . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
2773 2773 2774 2774 2775 2778
TSVIEW Command and Macro The TSVIEW command invokes the Time Series Viewer. This is a component of the Time Series Forecasting System that can also be used as a standalone graphical viewer for any time series data set or view. See the section “Time Series Viewer Window” in Chapter 45, “Window Reference,” for more information. The TSVIEW command must be specified from the command line or an SCL program. If you need to submit from the program editor, use the %TSVIEW macro instead. You can use the macro within a data step program, but you must submit it within the SAS windowing environment. If the TSVIEW command or %TSVIEW macro is issued without arguments, the Series Selection window appears to enable you to select an input data set and series. This is equivalent to selecting “Time Series Viewer” from the Analysis submenu of the Solutions menu. By specifying the DATA= and VAR= arguments, you can bring up the Time Series Viewer window directly. The ID= and INTERVAL= arguments are useful when the system cannot determine them automatically from the data.
Syntax The TSVIEW command has the following form: TSVIEW [options] ;
The %TSVIEW macro has the following form:
2774 F Chapter 44: Command Reference
%TSVIEW [(option, . . . , option) ] ;
The following options can be specified for the command and the macro. DATA=data set name
specifies the name of the SAS data set containing the input data. VAR=time series variable name
specifies the series variable name. It must be a numeric variable contained in the data set. ID=time id variable name
specifies the time ID variable name for the data set. If the ID= option is not specified, the system attempts to locate the variables named DATE, DATETIME, and TIME in the data set specified by the DATA= option. INTERVAL=interval name
specifies the time ID interval between observations in the data set.
Examples TSVIEW Command tsview data=sashelp.air var=air tsview data=dept.prod var=units id=period interval=qtr
%TSVIEW Macro %tsview( data=sashelp.air, var=air); %tsview( data=dept.prod, var=units, id=period, interval=qtr);
FORECAST Command and Macro The FORECAST command invokes the Time Series Forecasting System. The command must be specified from the command line or an SCL program. If you need to submit from the program editor, use the %FORECAST macro instead. You can use the macro within a data step program, but you must submit it within the SAS windowing environment. If the FORECAST command or %FORECAST macro is issued without arguments, the Time Series Forecasting window appears. This is equivalent to selecting “Time Series Forecasting System” from the Analysis submenu of the Solutions menu. Using the arguments, it is possible to do the following:
Syntax F 2775
Bring up the system with information already filled into some of the fields Bring up the system starting at a different window than the default Time Series Forecasting window Run the system in unattended mode so that a task such as creating a forecast data set is accomplished without any user interaction. By submitting such commands repeatedly from a SAS/AF or SAS/EIS application, it is possible to do “batch” processing for many data sets or by-group processing for many subsets of a data set. You can create a project in unattended mode and later open it for inspection interactively. You can also create a project interactively in order to set options, fit a model, or edit the list of models, and then use this project later in unattended mode. The Forecast Command Builder, a point-and-click SAS/AF application, makes it easy to specify, run, save, and rerun forecasting jobs by using the FORECAST command. To use it, enter the following on the command line (not the program editor): %FCB
or AF C=SASHELP.FORCAST.FORCCMD.FRAME.
Syntax The FORECAST command has the following form: FORECAST [options] ;
The %FORECAST macro has the following form: %FORECAST [(option, . . . , option ) ] ;
The following options can be specified for the command and the macro. PROJECT=project name
specifies the name of the SAS catalog entry in which forecasting models and other results are stored and from which previously stored results are loaded into the forecasting system. DATA=data set name
specifies the name of the SAS data set containing the input data. VAR=time series variable name
specifies the series variable name. It must be a numeric variable contained in the data set. ID=time id variable name
specifies the time ID variable name for the data set. If the ID= option is not specified, the system attempts to locate the variables named DATE, DATETIME, and TIME in the data set specified by the DATA= option. However, it is recommended that you specify the time ID variable whenever you are using the ENTRY= argument.
2776 F Chapter 44: Command Reference
INTERVAL=interval name
specifies the time ID interval between observations in the data set. Commonly used intervals are year, semiyear, qtr, month, semimonth, week, weekday, day, hour, minute, and second. See Chapter 4, “Date Intervals, Formats, and Functions,” for information about more complex interval specifications. If the INTERVAL= option is not specified, the system attempts to determine the interval based on the time ID variable. However, it is recommended that you specify the interval whenever you are using the ENTRY= argument. STAT=statistic
specifies the name of the goodness-of-fit statistic to be used as the model selection criterion. The default is RMSE. Valid names are sse
sum of square error
mse
mean square error
rmse
root mean square error
mae
mean absolute error
mape
mean absolute percent error
aic
Akaike information criterion
sbc
Schwarz Bayesian information criterion
rsquare
R-square
adjrsq
adjusted R-square
rwrsq
random walk R-square
arsq
Amemiya’s adjusted R-square
apc
Amemiya’s prediction criterion
CLIMIT=integer
specifies the level of the confidence limits to be computed for the forecast. This integer represents a percentage; for example, 925 indicates 92.5% confidence limits. The default is 95—that is, 95% confidence limits. HORIZON=integer
specifies the number of periods into the future for which forecasts are computed. The default is 12 periods. The maximum is 9999. ENTRY=name
The name of an entry point into the system. Valid names are main
starts the system at the Time Series Forecasting window (default).
devmod
starts the system at the Develop Models window.
viewmod
starts the system at the Model Viewer window. Specify a project that contains a forecasting model by using the PROJECT= option. If a project containing a model is not specified, the message “No forecasting model to view” appears.
Syntax F 2777
viewser
starts the system at the Time Series Viewer window.
autofit
runs the system in unattended mode, fitting a forecasting model automatically and saving it in a project. If PROJECT= is not specified, the default project name SASUSER.FMSPROJ.PROJ is used.
forecast
runs the system in unattended mode to generate a forecast data set. The name of this data set is specified by the OUT= parameter. If OUT= is not specified, a window appears to prompt for the name and label of the output data set. If PROJECT= is not specified, the default project name SASUSER.FMSPROJ.PROJ is used. If the project does not exist or does not contain a forecasting model for the specified series, automatic model fitting is performed and the forecast is computed by using the automatically selected model. If the project exists and contains a forecasting model for the specified series, the forecast is computed by using this model. If the series covers a different time range than it did when the project was created, use the REFIT or REEVAL keyword to reset the time ranges.
OUT=argument
specifies one or two-level name of a SAS data set in which forecasts are saved. Use in conjunction with ENTRY=FORECAST. If omitted, the system prompts for the name of the forecast data set. KEEP=argument
specifies the number of models to keep in the project when automatic model fitting is performed. This corresponds to Models to Keep in the Automatic Model Selection Options window. A value greater than 9 indicates that all models are kept. The default is 1. DIAG=YES|NO
specifies which models to search with regard to series diagnostics. DIAG= YES causes the automatic model selection process to search only over those models that are consistent with the series diagnostics. DIAG= NO causes the automatic model selection process to search over all models in the selection list, without regard for the series diagnostics. This corresponds to Models to Fit in the Automatic Model Selection Options window. The default is YES. REFIT=keyword
(for macro usage) refits a previously saved forecasting model by using the current fit range; that is, it reestimates the model parameters. Refitting also causes the model to be reevaluated (statistics of fit recomputed), and it causes the time ranges to be reset if the data range has changed (for example, if new observations have been added to the series). This keyword has no effect if you do not use the PROJECT= argument to reference an existing project containing a forecasting model. Use the REFIT keyword if you have added new data to the input series and you want to refit the forecasting model and update the forecast by using the new time ranges. Be sure to use the same project, data set, and series names that you used previously. REEVAL=keyword
(for macro usage) reevaluates a previously saved forecasting model by using the current evaluation range; that is, it recomputes the statistics of fit. Reevaluating also causes the time ranges to be reset if the data range has changed (for example, if new observations have been added to the series). It does not refit the model parameters. This keyword has no effect if you
2778 F Chapter 44: Command Reference
also specify REFIT, or if you do not use the PROJECT= argument to reference an existing project containing a forecasting model. Use the REEVAL keyword if you have added new data to the input series and want to update your forecast by using a previously fit forecasting model and the same project, data set, and series names that you used previously.
Examples FORECAST Command The following command opens the Time Series Forecasting window with the data set name and series name filled in. The time ID variable is also filled in since the data set contains the variable DATE. The interval is filled in because the system recognizes that the observations are monthly. forecast data=sashelp.air var=air
The following command opens the Time Series Forecasting window with the project, data set name, series, time ID, and interval fields filled in, assuming that the project SAMPROJ was previously saved either interactively or by using unattended mode as depicted below. Previously fit models appear when the Develop Models or Manage Projects window is opened. forecast project=samproj
The following command runs the system in unattended mode, fitting a model automatically, storing it in the project SAMPROJ in the default catalog SASUSER.FMSPROJ, and placing the forecasts in the data set WORK.SAMPOUT. forecast data=sashelp.workers var=electric id=date interval=month project=samproj entry=forecast out=sampout
The following command assumes that a new month’s data have been added to the data set from the previous example and that an updated forecast is needed that uses the previously fit model. Time ranges are automatically updated to include the new data since the REEVAL keyword is included. Substitute REFIT for REEVAL if you want the system to reestimate the model parameters. forecast data=sashelp.workers var=electric id=date interval=month project=samproj entry=forecast out=sampout reeval
The following command opens the model viewer with the project created in the previous example and with 99 percent confidence limits in the forecast graph. forecast data=sashelp.workers var=electric id=date interval=month project=samproj entry=viewmod climit=99
Examples F 2779
The final example illustrates using unattended mode with an existing project that has been defined interactively. In this example, the goal is to add a model to the model selection list, to specify that all models in that list be fit, and that all models which are fit successfully be retained. First open the Time Series Forecasting window and specify a new project name, WORKPROJ. Then select Develop Models, choosing SASHELP.WORKERS as the data set and MASONRY as the series. Now select “Model Selection List” from the Options menu. In the Model Selection List window, click Actions, then Add, and then ARIMA Model. Define the model ARIMA(0,1,0)(0,1,0)s NOINT by setting the differencing value to 1 under both ARIMA Options and Seasonal ARIMA Options. Select OK to save the model and OK to close the Model Selection List window. Now select “Automatic Fit” from the Options menu. In the Automatic Model Selection Options window, select “All autofit models in selection list” in the Models to fit radio box, select “All models” from the Models to keep combo box, and then click OK to close the window. Select “Save Project” from the File menu, and then close the Develop Models window and the Time Series Forecasting window. You now have a project with a new model added to the selection list, options set for automatic model fitting, and one series selected but no models fit. Now enter the command: forecast data=sashelp.workers var=electric id=date interval=month project=workproj entry=forecast out=workforc
The system runs in unattended mode to update the project and create the forecast data set WORKFORC. Check the messages in the Log window to find out if the run was successful and which model was selected for forecasting. To see the forecast data set, issue the command viewtable WORKFORC. To see the contents of the project, open the Time Series Forecasting window, open the project WORKPROJ, and select “Manage Projects.” You will see that the variable ELECTRIC was added to the project and has a forecasting model. Select this row in the table and then select List Models from the Tools menu. You will see that all of the models in the selection list which fit successfully are there, including the new model you added to the selection list.
%FORECAST Macro This example demonstrates the use of the %FORECAST macro to start the Time Series Forecasting System from a SAS program submitted from the Editor window. The SQL procedure is used to create a view of a subset of a products data set. Then the %FORECAST macro is used to produce forecasts. proc sql; create view selprod as select * from products where type eq 'A' order by date; run; %forecast(data=selprod, var=amount, id=date, interval=day, entry=forecast, out=typea, proj=proda, refit= );
2780
Chapter 45
Window Reference Contents Overview . . . . . . . . . . . . . . . . . . . . . . . Adjustments Selection Window . . . . . . . . . . . AR/MA Polynomial Specification Window . . . . . ARIMA Model Specification Window . . . . . . . . ARIMA Process Specification Window . . . . . . . Automatic Model Fitting Window . . . . . . . . . . Automatic Model Fitting Results Window . . . . . . Automatic Model Selection Options Window . . . . Custom Model Specification Window . . . . . . . . Data Set Selection Window . . . . . . . . . . . . . . Default Time Ranges Window . . . . . . . . . . . . Develop Models Window . . . . . . . . . . . . . . . Differencing Specification Window . . . . . . . . . Dynamic Regression Specification Window . . . . . Dynamic Regressors Selection Window . . . . . . . Error Model Options Window . . . . . . . . . . . . External Forecast Model Specification Window . . . Factored ARIMA Model Specification Window . . . Forecast Combination Model Specification Window . Forecasting Project File Selection Window . . . . . Forecast Options Window . . . . . . . . . . . . . . Intervention Specification Window . . . . . . . . . . Interventions for Series Window . . . . . . . . . . . Manage Forecasting Project Window . . . . . . . . . Model Fit Comparison Window . . . . . . . . . . . Model List Window . . . . . . . . . . . . . . . . . . Model Selection Criterion Window . . . . . . . . . . Model Selection List Editor Window . . . . . . . . . Model Viewer Window . . . . . . . . . . . . . . . . Models to Fit Window . . . . . . . . . . . . . . . . Polynomial Specification Window . . . . . . . . . . Produce Forecasts Window . . . . . . . . . . . . . . Regressors Selection Window . . . . . . . . . . . . Save Data As . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
. .
.
.
.
2782 2782 2783 2785 2788 2789 2793 2796 2797 2801 2803 2804 2812 2813 2814 2815 2816 2817 2819 2821 2823 2823 2825 2827 2833 2834 2838 2839 2843 2849 2851 2852 2856 2857
2782 F Chapter 45: Window Reference
Save Graph As . . . . . . . . . . . . . . . . . . . Seasonal ARIMA Model Options Window . . . . . Series Diagnostics Window . . . . . . . . . . . . . Series Selection Window . . . . . . . . . . . . . . Series to Process Window . . . . . . . . . . . . . Series Viewer Transformations Window . . . . . . Smoothing Model Specification Window . . . . . . Smoothing Weight Optimization Window . . . . . Statistics of Fit Selection Window . . . . . . . . . Time ID Creation – 1,2,3 Window . . . . . . . . . Time ID Creation from Several Variables Window . Time ID Creation from Starting Date Window . . . Time ID Creation Using Informat Window . . . . . Time ID Variable Specification Window . . . . . . Time Ranges Specification Window . . . . . . . . Time Series Forecasting Window . . . . . . . . . . Time Series Simulation Window . . . . . . . . . . Time Series Viewer Window . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. 2859 . 2860 . . 2861 . 2862 . 2865 . 2866 . 2868 . 2870 . 2872 . 2873 . 2873 . 2875 . 2876 . . 2877 . 2878 . 2880 . 2882 . 2883
Overview This chapter provides a reference to the various windows of the Time Series Forecasting System. The windows are presented in alphabetical order by name. Each section describes the purpose of the window, how to open it, its controls, fields, and menus. For windows that have their own menus, there is a description of each menu item under the heading “Menu Bar.” These windows also have a toolbar with icons that duplicate the more commonly used menu items. Each icon has a screen tip: a brief description that appears when you hover the mouse cursor over the icon. If you don’t see the screen tips, open the SAS Preferences window, under the Options submenu of the Tools menu. Select the View tab and make sure the “Screen tips” check box is checked.
Adjustments Selection Window Use the Adjustments Selection window to select input variables for use as adjustments to the forecasts and add them to the Predictors list. Invoke this window from the pop-up menu that appears when you select the Add button of the ARIMA Model Specification window or Custom Model Specification window. For more information, see the “Adjustments” section in Chapter 43, “Using Predictor Variables.”
AR/MA Polynomial Specification Window F 2783
Controls and Fields Dependent
is the name and variable label of the current series. Adjustments
is a table that lists the names and labels in the input data set available for selection as adjustments. The variables you select are highlighted. Selecting a highlighted row again deselects that variable. OK
closes the Adjustments Selection window and adds the selected variables as adjustments in the model. Cancel
closes the window without adding any adjustments. Reset
resets all selections to their initial values upon entry to the window.
AR/MA Polynomial Specification Window Use these windows to specify the autoregressive and moving-average terms in a factored ARIMA model. Access the AR Polynomial Specification window from the Set button next to the Autoregressive term in the Factored ARIMA Model Specification window. Access the MA Polynomial Specification window from the Set button next to the Moving Average term.
2784 F Chapter 45: Window Reference
Controls and Fields List of Polynomials
Lists the polynomials that have been specified. Each polynomial is represented by a commadelimited list of lag values enclosed in parentheses. New
Opens the Polynomial Specification window to add a new polynomial to the model. Edit
Opens the Polynomial Specification window to edit a polynomial that has been selected. If no polynomial is selected, this button is unavailable. Remove
Removes a selected polynomial from the list. If none are selected, this button is unavailable. Remove All
Clears the list of polynomials. Move Up
Moves a selected polynomial up one position in the list. If no polynomial is selected, or the first one is selected, this button is unavailable. Move Down
Moves a selected polynomial down one position in the list. If no polynomial is selected, or the last one is selected, this button is unavailable. OK
Closes the window and returns the specified list of polynomials to the Factored ARIMA Model Specification window.
ARIMA Model Specification Window F 2785
Cancel
Closes the window and discards any changes made to the list of polynomials.
ARIMA Model Specification Window Use the ARIMA Model Specification window to specify and fit an ARIMA model with or without predictor effects as inputs. Access it from the Develop Models menu, where it is invoked from the Fit Model item under Edit in the menu bar, or from the pop-up menu when you click an empty area of the model table.
Controls and Fields Series
is the name and variable label of the current series. Model
is a descriptive label for the model that you specify. You can type a label in this field or allow the system to provide a label. If you leave the label blank, a label is generated automatically based on the options you specify. ARIMA Options
specify the orders of the ARIMA model. You can either type in a value or click the arrow to select from a list.
2786 F Chapter 45: Window Reference
Autoregressive
defines the order of the autoregressive part of the model. Differencing
defines the order of simple differencing—for example, first difference or second difference. Moving Average
defines the order of the moving-average part of the model. Seasonal ARIMA Options
specify the orders of the seasonal part of the ARIMA model. You can either type in a value or click the arrow to select from a list. Autoregressive
defines the order of the seasonal autoregressive part of the model. Differencing
defines the order of seasonal differencing—for example, first difference or second difference at the seasonal lags. Moving Average
defines the order of the seasonal moving-average part of the model. Transformation
defines the series transformation for the model. When a transformation is specified, the ARIMA model is fit to the transformed series, and forecasts are produced by applying the inverse transformation to the ARIMA model predictions. The available transformations are: Log, Logistic, Square Root, Box-Cox, and None. Intercept
specify whether a mean or intercept parameter is included in the ARIMA model. By default, the Intercept option is set to No when the model includes differencing and Yes when there is no differencing. Predictors
lists the predictor effects included as inputs in the model. OK
closes the ARIMA Model Specification window and fits the model. Cancel
closes the ARIMA Model Specification window without fitting the model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the ARIMA Model Specification window. This might be useful when editing an existing model specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values. Add
opens a menu of types of predictors to add to the Predictors list.
ARIMA Model Specification Window F 2787
Delete
deletes the selected (highlighted) entry from the Predictors list. Edit
edits the selected (highlighted) entry in the Predictors list.
Mouse Button Actions You can select or deselect entries in the Predictors list by clicking them. The selected (highlighted) predictor effect is acted on by the Delete and Edit buttons. Double-clicking on a predictor in the list invokes an appropriate edit action for that predictor. If you right-click an entry in the Predictors list, the system displays the following menu of actions that encompass the features of the Add, Delete, and Edit buttons. Add Linear Trend
adds a Linear Trend item to the Predictors list. Add Trend Curve
opens a menu of different time trend curves and adds the curve you select to the Predictors list. Certain trend curve specifications also set the Transformation field. Add Regressors
opens the Regressors Selection window to enable you to select other series in the input data set as regressors to predict the dependent series and add them to the Predictors list. Add Adjustments
opens the Adjustments Selection window to enable you to select other series in the input data set for use as adjustments to the forecasts and add them to the Predictors list. Add Dynamic Regressor
opens the Dynamic Regressor Selection window to enable you to select a series in the input data set as a predictor of the dependent series and also specify a transfer function model for the effect of the predictor series. Add Interventions
opens the Interventions for Series window to enable you to define and select intervention effects and add them to the Predictors list. Add Seasonal Dummies
adds a Seasonal Dummies predictor item to the Predictors list. Edit Predictor
edits the selected (highlighted) entry in the Predictors list. Delete Predictors
deletes the selected (highlighted) entry from the Predictors list.
2788 F Chapter 45: Window Reference
ARIMA Process Specification Window Use the ARIMA Process Specification window to define ARIMA processes for simulation. Invoke this window from the Add Series button in the Time Series Simulation window.
Controls and Fields Series Name
is the variable name for the series to be simulated. Series Label
is the variable label for the series to be simulated. Series Mean
is the mean of the simulated series. Transformation
defines the series transformation. Simple Differencing
is the order of simple differencing for the series. Seasonal Differencing
is the order of seasonal differencing for the series. AR Parameters
is a table of autoregressive terms for the simulated ARIMA process. Enter a value for Factor, Lag, and Value for each term of the AR part of the process you want to simulate. For a
Automatic Model Fitting Window F 2789
non-factored AR model, make the Factor values the same for all terms. For a factored AR model, use different Factor values to group the terms into the factors. MA Parameters
is a table of moving-average terms for the simulated ARIMA process. Enter a value for Factor, Lag, and Value for each term of the MA part of the process you want to simulate. For a non-factored MA model, make the Factor values the same for all terms. For a factored MA model, use different Factor values to group the terms into the factors. OK
closes the ARIMA Process Specification window and adds the specified process to the Series to Generate list in the Time Series Simulation window. Cancel
closes the window without adding to the Series to Generate list. Any options you specified are lost. Reset
resets all the fields to their initial values upon entry to the window. Clear
resets all the fields to their default values.
Automatic Model Fitting Window Use the Automatic Model Fitting window to perform automatic model selection on all series or selected series in an input data set. Invoke this window by using the Fit Models Automatically button on the Time Series Forecasting window. Note that you can also perform automatic model fitting, one series at a time, from the Develop Models window.
2790 F Chapter 45: Window Reference
Controls and Fields Project
the name of the SAS catalog entry in which the results of the model search process are stored. Input Data Set
is the name of the current input data set. You can type in a one-level or two-level data set name here. Browse button
opens the Data Set Selection window for selecting an input data set. Time ID
is the name of the ID variable for the input data set. You can type in the variable name here or use the Select or Create button. time ID Select button opens the Time ID Variable Specification window. time ID Create button opens a menu of choices of methods for creating a time ID variable for the input data set. Use this feature if the input data set does not already contain a valid time ID variable. Interval
is the time interval between observations (data frequency) in the current input data set. You can type in an interval name or select one by using the combo box pop-up menu. Series to Process
indicates the number and names of time series variables for which forecasting model selection will be applied.
Automatic Model Fitting Window F 2791
Series to Process Select button opens the Series to Process window to let you select the series for which you want to fit models. Selection Criterion
shows the goodness-of-fit statistic that will be used to determine the best fitting model for each series. Selection Criterion Select button opens the Model Selection Criterion window to enable you to select the goodness-of-fit statistic that will be used to determine the best fitting model for each series. Run button
begins the automatic model fitting process. Models Fit button
opens the Automatic Model Fitting Results window to display the models fit during the current invocation of the Automatic Model Fitting window. The results appear automatically when model fitting is complete, but this button enables you to redisplay the results window. Close button
Closes the Automatic Model Fitting window.
Menu Bar File
Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Close
closes the Automatic Model Fitting window. View
Input Data Set
opens a Viewtable window to browse the current input data set. Models Fit
opens Automatic Model Fitting Results window to show the forecasting models fit during the current invocation of the Automatic Model Fitting window. This is the same as the Models Fit button.
2792 F Chapter 45: Window Reference
Tools
Fit Models
performs the automatic model selection process for the selected series. This is the same as the Run button. Options
Default Time Ranges
opens the Default Time Ranges window to enable you to control how the system sets the time ranges for series. Model Selection List
opens the Model Selection List editor window. Use this action to control the forecasting models considered by the automatic model selection process and displayed in the Models to Fit window. Model Selection Criterion
opens the Model Selection Criterion window, which presents a list of goodness-of-fit statistics and enables you to select the fit statistic that is displayed in the table and used by the automatic model selection process to determine the best fitting model. This action is the same as the Selection Criterion Select button. Statistics of Fit
opens the Statistics of Fit Selection window, which presents a list of statistics that the system can display. Use this action to customize the list of statistics shown in the Statistics of Fit table and available for selection in the Model Selection Criterion menu. Forecast Options
opens the Forecast Options window, which enables you to control the widths of forecast confidence limits and control the kind of predicted values computed for models that include series transformations. Forecast Data Set
see Produce Forecasts window. Alignment of Dates
Beginning
aligns dates that the system generates to identify forecast observations in output data sets to the beginning of the time intervals. Middle
aligns dates that the system generates to identify forecast observations in output data sets to the midpoints of the time intervals. End
aligns dates that the system generates to identify forecast observations in output data sets to the end of the time intervals. Automatic Fit
opens the Automatic Model Selection Options window, which enables you to control the
Automatic Model Fitting Results Window F 2793
number of models retained by the automatic model selection process and whether the models considered for automatic selection are subset according to the series diagnostics. Tool Bar Type
Image Only
displays the toolbar items as icons without text. Label Only
displays the toolbar items as text without icon images. Both
displays the toolbar items with both text and icon images. Include Interventions
controls whether intervention effects defined for the current series are automatically added as predictors to the models considered by the automatic selection process. A check mark or filled check box next to this item indicates that the option is turned on. Print Audit Trail
prints to the SAS log information about the models fit by the system. A check mark or filled check box next to this item indicates that the audit option is turned on. Show Source Statements
controls whether SAS statements submitted by the forecasting system are printed in the SAS log. When the Show Source Statements option is selected, the system sets the SAS system option SOURCE before submitting SAS statements; otherwise, the system uses the NOSOURCE option. Note that only some of the functions performed by the forecasting system are accomplished by submitting SAS statements. A check mark or filled check box next to this item indicates that the option is turned on.
Automatic Model Fitting Results Window This resizable window displays the models fit by the most recent invocation of the Automatic Model Fitting window. It appears automatically after Automatic Model Fitting runs, and can be opened repeatedly from that window by using the Models Fit button or by selecting Models Fit from the View menu. Once you exit the Automatic Model Fitting window, the Automatic Model Fitting Results window cannot be opened again until you fit additional models by using Automatic Model Fitting.
2794 F Chapter 45: Window Reference
Table Contents The results table displays the series name in the first column and the model label in the second column. If you have chosen to retain more than one model by using the Automatic Model Selection Options window, more than one row appears in the table for each series; that is, there is a row for each model fit. If you have already fit models to the same series before invoking the Automatic Model Fitting window, those models do not appear here, since the Automatic Model Fitting Results window is intended to show the results of the current operation of Automatic Model Fitting. To see all models that have been fit, use the Manage Projects window. The third column of the table shows the values of the current model selection criterion statistic. Additional columns show the values of other fit statistics. The set of statistics shown are selectable by using the Statistics of Fit Selection window. The table can be sorted by any column other than Series Name by clicking on the column heading.
Controls and Fields Graph
opens the Model Viewer window on the model currently selected in the table. Stats
opens the Statistics of Fit Selection window. This controls the set of goodness-of-fit statistics displayed in the table and in other parts of the Time Series Forecasting System. Compare
opens the Model Fit Comparison window for the series currently selected in the table. This button is unavailable if the currently selected row in the table represents a series for which fewer than two models have been fit.
Automatic Model Fitting Results Window F 2795
Save
opens an output data set dialog, enabling you to specify a SAS data set to which the contents of the table is saved. Note that this operation saves what you see in the table. If you want to save the models themselves for use in a future session, use the Manage Projects window. Print
prints the contents of the table. Close
closes the window and returns to the Automatic Model Fitting window.
Menu Bar File
Save
opens an output data set dialog, enabling you to specify a SAS data set to which the contents of the table is saved. This is the same as the Save button. Print
prints the contents of the table. This is the same as the Print button. Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Close
closes the window and returns to the Automatic Model Fitting window. View
Model Predictions
opens the Model Viewer to display a predicted and actual plot for the currently highlighted model. Prediction Errors
opens the Model Viewer to display the prediction errors for the currently highlighted model. Prediction Error Autocorrelations
opens the Model Viewer to display the prediction error autocorrelations, partial autocorrelations, and inverse autocorrelations for the currently highlighted model.
2796 F Chapter 45: Window Reference
Prediction Error Tests
opens the Model Viewer to display graphs of white noise and stationarity tests on the prediction errors of the currently highlighted model. Parameter Estimates
opens the Model Viewer to display the parameter estimates table for the currently highlighted model. Statistics of Fit
opens the Model Viewer window to display goodness-of-fit statistics for the currently highlighted model. Forecast Graph
opens the Model Viewer to graph the forecasts for the currently highlighted model. Forecast Table
opens the Model Viewer to display forecasts for the currently highlighted model in a table. Tools
Compare Models
opens the Model Fit Comparison window to display fit statistics for selected pairs of forecasting models. This item is unavailable until you select a series in the table for which the automatic model fitting run selected two or more models. Options
Statistics of Fit
opens the Statistics of Fit Selection window. This is the same as the Stats button. Column Labels
selects long or short column labels for the table. Long column labels are used by default. ID Columns
freezes or unfreezes the series and model columns. By default they are frozen so that they remain visible when you scroll the table horizontally to view other columns.
Automatic Model Selection Options Window Use the Automatic Model Selection Options window to control the automatic selection process. This window is available from the Automatic Fit item of the Options menu in the Develop Models window, Automatic Model Fitting window, and Produce Forecasts window.
Custom Model Specification Window F 2797
Controls and Fields Models to fit
Subset by series diagnostics
when selected, causes the automatic model selection process to search only over those models consistent with the series diagnostics. All models in selection list
when selected, causes the automatic model selection process to search over all models in the search list, without regard for the series diagnostics. Models to keep
specifies how many of the models tried by the automatic model selection process are retained and added to the model list for the series. You can specify the best fitting model only, the best n models, where n can be 1 through 9, or all models tried. OK
closes the window and saves the automatic model selection options you specified. Cancel
closes the window without changing the automatic model selection options.
Custom Model Specification Window Use the Custom Model Specification window to specify and fit an ARIMA model with or without predictor effects as inputs. Access it from the Develop Models window, where it is invoked from the Fit Model item under the Edit menu, or from the pop-up menu when you click an empty area of the model table.
2798 F Chapter 45: Window Reference
Controls and Fields Series
is the name and variable label of the current series. Model
is a descriptive label for the model that you specify. You can type a label in this field or allow the system to provide a label. If you leave the label blank, a label is generated automatically based on the options you specify. Transformation
defines the series transformation for the model. When a transformation is specified, the model is fit to the transformed series, and forecasts are produced by applying the inverse transformation to the resulting forecasts. The following transformations are available: Log
specifies a logarithmic transformation. Logistic
specifies a logistic transformation. Square Root
specifies a square root transformation. Box-Cox
specifies a Box-Cox transform and opens a window to specify the Box-Cox parameter. None
specifies no series transformation.
Custom Model Specification Window F 2799
Trend Model
controls the model options to model and forecast the series trend. Select from the following: Linear Trend
adds a Linear Trend item to the Predictors list. Trend Curve
brings of a menu of different time trend curves and adds the curve you select to the Predictors list. First Difference
specifies differencing the series. Second Difference
specifies second-order differencing of the series. None
specifies no model for the series trend. Seasonal Model
controls the model options to model and forecast the series seasonality. Select from the following: Seasonal ARIMA
opens the Seasonal ARIMA Model Options window to enable you to specify an ARIMA model for the seasonal pattern in the series. Seasonal Difference
specifies differencing the series at the seasonal lag. Seasonal Dummy Regressors
adds a Seasonal Dummies predictor item to the Predictors list. None
specifies no seasonal model. Error Model
displays the current settings of the autoregressive and moving-average terms, if any, for modeling the prediction error autocorrelation pattern in the series. Set button
opens the Error Model Options window to enable you to set the autoregressive and movingaverage terms for modeling the prediction error autocorrelation pattern in the series. Intercept
specifies whether a mean or intercept parameter is included in the model. By default, the Intercept option is set to No when the model includes differencing and set to Yes when there is no differencing. Predictors
is a list of the predictor effects included as inputs in the model. OK
closes the Custom Model Specification window and fits the model.
2800 F Chapter 45: Window Reference
Cancel
closes the Custom Model Specification window without fitting the model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the Custom Model Specification window. This might be useful when editing an existing model specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values. Add
opens a menu of types of predictors to add to the Predictors list. Select from the following: Linear Trend
adds a Linear Trend item to the Predictors list. Trend Curve
opens a menu of different time trend curves and adds the curve you select to the Predictors list. Regressors
opens the Regressors Selection window to enable you to select other series in the input data set as regressors to predict the dependent series and add them to the Predictors list. Adjustments
opens the Adjustments Selection window to enable you to select other series in the input data set for use as adjustments to the forecasts and add them to the Predictors list. Dynamic Regressor
opens the Dynamic Regressor Selection window to enable you to select a series in the input data set as a predictor of the dependent series and also specify a transfer function model for the effect of the predictor series. Interventions
opens the Interventions for Series window to enable you to define and select intervention effects and add them to the Predictors list. Seasonal Dummies
adds a Seasonal Dummies predictor item to the Predictors list. This is unavailable if the series interval is not one which has a seasonal cycle. Delete
deletes the selected (highlighted) entry from the Predictors list. Edit
edits the selected (highlighted) entry in the Predictors list.
Mouse Button Actions You can select or deselect entries in the Predictors list by clicking them. The selected (highlighted) predictor effect is acted on by the Delete and Edit buttons. Double-clicking on a predictor in the list invokes an appropriate edit action for that predictor.
Data Set Selection Window F 2801
If you right-click an entry in the Predictors list and press the right mouse button, the system displays a menu of actions that encompass the features of the Add, Delete, and Edit buttons.
Data Set Selection Window Use this resizable window to select a data set to process by specifying a library and a SAS data set or view. These selections can be made by typing, by selecting from lists, or by a combination of the two. In addition, you can control the time ID variable and time interval, and you can browse the data set.
Access this window by using the Browse button to the right of the Data Set field in the Time Series Forecasting, Automatic Model Fitting, and Produce Forecasts windows. It functions in the same way as the Series Selection window, except that it does not allow you to select or view a time series variable.
Controls and Fields Library
is a SAS libname assigned within the current SAS session. If you know the libname associated with the data set of interest, you can type it in this field. If it is a valid choice, it will appear in the libraries list and will be highlighted. The SAS Data Sets list will be populated with data sets associated with that libname. See also Libraries under Selection Lists. Data Set
is the name of a SAS data set (data file or data view) that resides under the selected libname. If
2802 F Chapter 45: Window Reference
you know the name, you can type it in and press Return. If it is a valid choice, it will appear in the SAS Data Sets list and will be highlighted. Time ID
is the name of the ID variable for the selected input data set. To specify the ID variable, you can type the ID variable name in this field or select the control arrows to the right of the field. Time ID Select button opens the Time ID Variable Specification window. Time ID Create button opens a menu of methods for creating a time ID variable for the input data set. Use this feature if the data set does not already contain a valid time ID variable. Interval
is the time interval between observations (data frequency) in the selected data set. If the interval is not automatically identified by the system, you can type in the interval name or select it from a list by clicking the combo box arrow. For more information about intervals, see Chapter 4, “Date Intervals, Formats, and Functions,” in this book. OK
closes the Data Set Selection window and makes the selected data set the current input data set. Cancel
closes the window without applying any selections made. Table
opens a Viewtable window for browsing the selected data set. Reset
resets the fields to their initial values upon entry to the window. Refresh
updates all fields and lists in the window. If you assign a new libname without exiting the Data Set Selection window, use the refresh action to update the Libraries list so that it will include the newly assigned libname.
Selection Lists Libraries
displays a list of currently assigned libnames. You can select a libname by clicking it with the left mouse button, which is equivalent to typing its name in the Library field. If you cannot locate the library or directory you are interested in, go to the SAS Explorer window, select “New” from the File menu, then select “Library” and “OK.” This opens the New Library window. You also assign a libname by submitting a libname statement from the Editor window. Select the Refresh button to make the new libname available in the libraries list. SAS Data Sets
displays a list of the SAS data sets (data files or data views) contained in the selected library. You can select one of these by clicking with the left mouse button, which is equivalent to typing its name in the Data set field. You can double-click a data set name to select it and exit the window.
Default Time Ranges Window F 2803
Default Time Ranges Window Use the Default Time Ranges window to control how the period of fit and evaluation and the forecasting horizon are determined for each series when you do not explicitly set these ranges for a particular series. Invoke this window from the Options menu of the Develop Models, Automatic Model Fitting, Produce Forecasts, and Manage Forecasting Project windows. The settings you make in this window affect subsequently selected series; they do not alter the time ranges of series you have already selected.
Controls and Fields Forecast Horizon
specifies the forecast horizon as either a number of periods or years from the last nonmissing data value or as a fixed date. You can type a number or date value in this field. Date value must be entered in a form recognized by a SAS date informat. (See SAS Language Reference: Concepts for information about SAS date informats.) Forecast Horizon Units
indicates whether the value in the forecast horizon field represents periods or years or a date. Click the arrow and select one from the pop-up list. Hold-out Sample Size
specifies that a number of observations, number of years, or percent of the data at the end of the data range be used for the period of evaluation with the remainder of data used as the period of fit. Hold-out Sample Size Units
indicates whether the hold-out sample size represents periods or years or percent of data range. Period of Fit
specifies how much of the data range for a series is to be used as the period of fit for models fit to the series. ALL indicates that all the available data is used. You can specify a number of periods, number of years, or a fixed date, depending on the value of the units field to the right. When you specify a date, the start of the period of fit is the specified date or the first nonmissing
2804 F Chapter 45: Window Reference
series value, whichever is more recent. Date value must be entered in a form recognized by a SAS date informat. (See SAS Language Reference: Concepts for information about SAS date informats.) When you specify the number of periods or years, the start of the period of fit is computed as the date that number of periods or years from the end of the data. Period of Fit Units
indicates whether the period-of-fit value represents periods or years or a date. OK
closes the window and stores the specified changes. Cancel
closes the window without saving changes. Any options you specified are lost. Defaults
resets all options to their default values. Reset
resets the options to their initial values upon entry to the window.
Develop Models Window This resizable window provides access to all of the Forecasting System’s interactive model fitting and graphical tools. Use it to fit forecasting models to an individual time series and choose the best model to use to produce the final forecasts of the series. Invoke this window by using the Develop Models button on the Time Series Forecasting window.
Develop Models Window F 2805
Controls and Fields Data Set
is the name of the current input data set. Interval
is the time interval (data frequency) for the input data set. Series
is the variable name and label of the current time series. Browse button
opens the Series Selection window to enable you to change the current input data set or series. Data Range
is the date of the first and last nonmissing data values available for the current series in the input data set. Fit Range
is the current period of fit setting. This is the range of data that will be used to fit models to the series. Evaluation Range
is the current period of evaluation setting. This is the range of data that will be used to calculate the goodness-of-fit statistics for models fit to the series. Set Ranges button
opens the Time Ranges Specification window to enable you to change the fit range or evaluation
2806 F Chapter 45: Window Reference
range. Note: A new fit range is applied when new models are fit or when existing models are refit. A new evaluation range is applied when new models are fit or when existing models are refit or reevaluated. Changing the ranges does not automatically refit or reevaluate any models in the table: Use the Refit Models or Reevaluate Models items under the Edit menu. View Series Graphically icon
opens the Time Series Viewer window to display plots of the current series. View Selected Model Graphically icon
opens the Model Viewer to display graphs and tables for the currently highlighted model. Forecast Model
is the column of the model table that contains check boxes to select which model is used to produce the final forecasts for the current series. Model Title
is the column of the model table that contains the descriptive labels of the forecasting models fit to the current series. Root Mean Square Error (or other statistic name) button
is the button above the right side of the table. It displays the name of the current model selection criterion: a statistic that measures how well each model in the table fits the values of the current series for observations within the evaluation range. Clicking this button opens the Model Selection Criterion window to let you to select a different statistic. When you select a statistic, the model table the Develop Models window is updated to show current values of that statistic.
Menu Bar File
New Project
opens a dialog that lets you create a new project, assign it a name and description, and make it the active project. Open Project
opens a dialog that lets you select and load a previously saved project. Save Project
saves the current state of the system (including all the models fit to a series) to the current project catalog entry. Save Project as
saves the current state of the system with a prompt for the name of the catalog entry in which to store the information. Clear Project
clears the system, deleting all the models for all series. Save Forecast
writes forecasts from the currently highlighted model to an output data set. Save Forecast As
prompts for an output data set name and saves the forecasts from the currently highlighted model.
Develop Models Window F 2807
Output Forecast Data Set
opens a dialog for specifying the default data set used when you select “Save Forecast.” Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Setup
opens the Print Setup window, which enables you to access your operating system print setup. Close
closes the Develop Models window and returns to the main window. Edit
Fit Model
Automatic Fit
invokes the automatic model selection process. Select From List
opens the Models to Fit window. Smoothing Model
opens the Smoothing Model Specification window. ARIMA Model
opens the ARIMA Model Specification window. Custom Model
opens the Custom Model Specification window. Combine Forecasts
opens the Forecast Combination Model Specification window. External Forecasts
opens the External Forecast Model Specification window. Edit Model
enables you to modify the specification of the currently highlighted model in the table and fit the modified model. The new model replaces the current model in the table. Delete Model
deletes the currently highlighted model from the model table.
2808 F Chapter 45: Window Reference
Refit Models
All Models
refits all models in the table by using data within the current fit range. Selected Model
refits the currently highlighted model by using data within the current fit range. Reevaluate Models
All Models
recomputes statistics of fit for all models in the table by using data within the current evaluation range. Selected Model
recomputes statistics of fit for the currently highlighted model by using data within the current evaluation range. View
Project
opens the Manage Forecasting Project window. Data Set
opens a Viewtable window to display the current input data set. Series
opens the Time Series Viewer window to display plots of the current series. This is the same as the View Series Graphically icon. Model Predictions
opens the Model Viewer to display a predicted versus actual plot for the currently highlighted model. This is the same as the View Selected Model Graphically icon. Prediction Errors
opens the Model Viewer to display the prediction errors for the currently highlighted model. Prediction Error Autocorrelations
opens the Model Viewer to display the prediction error autocorrelations, partial autocorrelations, and inverse autocorrelations for the currently highlighted model. Prediction Error Tests
opens the Model Viewer to display graphs of white noise and stationarity tests on the prediction errors of the currently highlighted model. Parameter Estimates
opens the Model Viewer to display the parameter estimates table for the currently highlighted model. Statistics of Fit
opens the Model Viewer window to display goodness-of-fit statistics for the currently highlighted model.
Develop Models Window F 2809
Forecast Graph
opens the Model Viewer to graph the forecasts for the currently highlighted model. Forecast Table
opens the Model Viewer to display forecasts for the currently highlighted model in a table. Tools
Diagnose Series
opens the Series Diagnostics window to determine the kinds of forecasting models appropriate for the current series. Define Interventions
opens the Interventions for Series window to enable you to edit or add intervention effects for use in modeling the current series. Sort Models
sorts the models in the table by the values of the currently displayed fit statistic. Compare Models
opens the Model Fit Comparison window to display fit statistics for selected pairs of forecasting models. This is unavailable if there are fewer than two models in the table. Generate Data
opens the Time Series Simulation window. This window enables you to simulate ARIMA time series processes and is useful for educational exercises or testing the system. Options
Time Ranges
opens the Time Ranges Specification window to enable you to change the fit and evaluation time ranges and the forecast horizon. This action is the same as the Set Ranges button. Default Time Ranges
opens the Default Time Ranges window to enable you to control how the system sets the time ranges for series when you do not explicitly set time ranges with the Time Ranges Specification window. Settings made by using this window do not affect series you are already working with; they take effect when you select a new series. Model Selection List
opens the Model Selection List editor window. Use this action to edit the set of forecasting models considered by the automatic model selection process and displayed by the Models to Fit window. Model Selection Criterion
opens the Model Selection Criterion window, which presents a list of goodness-of-fit statistics and enables you to select the fit statistic that is displayed in the table and used by the automatic model selection process to determine the best fitting model. This action is the same as clicking the button above the table which displays the name of the current model selection criterion.
2810 F Chapter 45: Window Reference
Statistics of Fit
opens the Statistics of Fit Selection window, which presents a list of statistics that the system can display. Use this action to customize the list of statistics shown in the Model Viewer, Automatic Model Fitting Results, and Model Fit Comparison windows and available for selection in the Model Selection Criterion menu. Forecast Options
opens the Forecast Options window, which enables you to control the widths of forecast confidence limits and control the kind of predicted values computed for models that include series transformations. Alignment of Dates
Beginning
aligns dates that the system generates to identify forecast observations in output data sets to the beginning of the time intervals. Middle
aligns dates that the system generates to identify forecast observations in output data sets to the midpoints of the time intervals. End
aligns dates that the system generates to identify forecast observations in output data sets to the end of the time intervals. Automatic Fit
opens the Automatic Model Selection Options window, which enables you to control the number of models retained by the automatic model selection process and whether the models considered for automatic selection are subset according to the series diagnostics. Include Interventions
controls whether intervention effects defined for the current series are automatically added as predictors to the models considered by the automatic selection process and displayed by the Models to Fit window. When the Include Interventions option is selected, the series interventions are also automatically added to the predictors list when you specify a model in the ARIMA and Custom Models Specification windows. A check mark or filled check box next to this item indicates that the option is turned on. Print Audit Trail
prints to the SAS log information about the models fit by the system. A check mark or filled check box next to this item indicates that the audit option is turned on. Show Source Statements
Controls whether SAS statements submitted by the forecasting system are printed in the SAS log. When the Show Source Statements option is selected, the system sets the SAS system option SOURCE before submitting SAS statements; otherwise, the system uses the NOSOURCE option. Note that only some of the functions performed by the forecasting system are accomplished by submitting SAS statements. A check mark or filled check box next to this item indicates that the option is turned on.
Develop Models Window F 2811
Left Mouse Button Actions for the Model Table When the cursor is over the description of a model in the table, the left mouse button selects (highlights) or deselects that model. On some computer systems, you can double-click to open the Model Viewer window for the selected model. When the cursor is over an empty part of the model table, the left mouse button opens a menu of model fitting choices. These choices are the same as those in the Fit Model submenu of the Edit menu.
Right Mouse Button Actions for the Model Table When a model in the table is selected, the right mouse opens a menu of actions that apply to the highlighted model. The actions available in this menu are as follows. View Model
opens the Model Viewer for the selected model. This action is the same as the View Model Graphically icon. View Parameter Estimates
opens the Model Viewer to display the parameter estimates table for the currently highlighted model. This is the same as the Parameter Estimates item in the View menu. View Statistics of Fit
opens the Model Viewer to display a table of goodness-of-fit statistics for the currently highlighted model. This is the same as the Statistics of Fit item in the View menu. Edit Model
enables you to modify the specification of the currently highlighted model in the table and fit the modified model. This is the same as the Edit Model item in the Edit menu. Refit Model
refits the highlighted model by using data within the current fit range. This is the same as the Selected Model item under the Refit Models submenu of the Edit menu. Reevaluate Model
reevaluates the highlighted model by using data within the evaluation fit range. This is the same as the Selected Model item under the Reevaluate Models submenu of the Edit menu. Delete Model
deletes the currently highlighted model from the model table. This is the same as the Delete Model item under the Edit menu. View Forecasts
opens the Model Viewer to display the forecasts for the currently highlighted model. This is the same as the Forecast Graph item under the View menu. When the model list is empty or when no model is selected, the right mouse button opens the same menu of model fitting actions as the left mouse button.
2812 F Chapter 45: Window Reference
Differencing Specification Window Use the Differencing Specification window to specify the list of differencing lags d=(lag, ..., lag) in a factored ARIMA model. To specify a first difference, add the value 1 (d=(1)). To specify a second difference (difference twice at lag 1), add the value 1 again (d=(1,1)). For first differencing at lags 1 and 12, use the values 1 and 12 (d=(1,12)).
Controls and Fields Lag
specifies a lag value to add to the list. Type in a positive integer or select one by clicking the spin box arrows. Duplicates are allowed. Add
adds the value in the Lag spin box to the list of differencing lags. Remove
deletes a selected lag from the list of differencing lags. OK
closes the window and returns the specified list to the Factored ARIMA Model Specification window. Cancel
closes the window and discards any lags added to the list.
Dynamic Regression Specification Window F 2813
Dynamic Regression Specification Window Use the Dynamic Regression Specification window to specify a dynamic regression or transfer function model for the effect of the predictor variable. It is invoked from the Dynamic Regressors Selection window.
Controls and Fields Series
is the name and variable label of the current series. Input Model
is a descriptive label for the dynamic regression model. You can type a label in this field or allow the system to provide the label. If you leave the label blank, a label is generated automatically based on the options you specify. When no options are specified, the label is the name and variable label of the predictor variable. Input Transformation
displays the transformation specified for the predictor variable. When a transformation is specified, the transfer function model is fit to the transformed input variable. Lagging periods
is the pure delay in the effect of the predictor, l. Simple Order of Differencing
is the order of differencing, d. Set this field to 1 to use the changes in the predictor variable. Seasonal Order of Differencing
is the order of seasonal differencing, D. Set this field to 1 to difference the predictor variable at the seasonal lags—for example, to use the year-over-year or week-over-week changes in the predictor variable.
2814 F Chapter 45: Window Reference
Simple Order Numerator Factors
is the order of the numerator factor of the transfer function, p. Seasonal Order Numerator Factors
is the order of the seasonal numerator factor of the transfer function, P. Simple Order Denominator Factors
is the order of the denominator factor of the transfer function, q. Seasonal Order Denominator Factors
is the order of the seasonal denominator factor of the transfer function, Q. OK
closes the window and adds the dynamic regression model specified to the model predictors list. Cancel
closes the window without adding the dynamic regression model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the window. This might be useful when editing a predictor specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values.
Dynamic Regressors Selection Window Use the Dynamic Regressors Selection window to select an input variable as a dynamic regressor. Access this window from the pop-up menu which appears when you select the Add button of the ARIMA Model Specification window or Custom Model Specification window.
Error Model Options Window F 2815
Controls and Fields Dependent
is the name and variable label of the current series. Dynamic Regressors
is a table listing the variables in the input data set. Select one variable in this list as the predictor series. OK
opens the Dynamic Regression Specification window for you to specify the form of the dynamic regression for the selected predictor series, and then closes the Dynamic Regressors Selection window and adds the specified dynamic regression to the model predictors list. Cancel
closes the window without adding the dynamic regression model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the window.
Error Model Options Window Use the Error Model Options window to specify the autoregressive and moving-average orders for the residual autocorrelation part of a model defined by using the Custom Model Specification window. Access it by using the Set button of that window.
2816 F Chapter 45: Window Reference
Controls and Fields ARIMA Options
Use these combo boxes to specify the orders of the ARIMA model. You can either type in a value or click the combo box arrow to select from a pop-up list. Autoregressive
defines the order of the autoregressive part of the model. Moving Average
defines the order of the moving-average term. OK
closes the Error Model Options window and returns to the Custom Model Specification window. Cancel
closes the Error Model Options window and returns to the Custom Model Specification window, discarding any changes made. Reset
resets all options to their initial values upon entry to the window.
External Forecast Model Specification Window Use the External Forecast Model Specification window to add to the current project forecasts produced externally to the Time Series Forecasting System. To add an external forecast, select a variable from the selection list and choose the OK button. The name of the selected variable will be added to the list of models fit, and the values of this variable will be used as the forecast. For more information, see “Incorporating Forecasts from Other Sources” in the “Specifying Forecasting Models” chapter.
Factored ARIMA Model Specification Window F 2817
Controls and Fields OK
closes the window and adds the external forecast to the project. Cancel
closes the window without adding an external forecast to the project. Reset
deselects any selection made in the selection list.
Factored ARIMA Model Specification Window Use the ARIMA Model Specification window to specify an ARIMA model by using the notation: p = (lag, ..., lag) ...(lag, ..., lag) d = (lag, ..., lag) q = (lag, ..., lag) ...(lag, ..., lag)
where p, d, and q represent autoregressive, differencing, and moving-average terms, respectively. Access it from the Develop Models menu, where it is invoked from the Fit Model item under Edit in the menu bar, or from the pop-up menu when you click an empty area of the model table.
2818 F Chapter 45: Window Reference
The Factored ARIMA Model Specification window is identical to the ARIMA Model Specification window, except that the p, d, and q terms are specified in a more general and less limited way. Only those controls and fields that differ from the ARIMA Model Specification window are described here.
Controls and Fields Model
is a descriptive label for the model. You can type a label in this field or allow the system to provide a label. If you leave the label blank, a label is generated automatically based on the p, d, and q terms that you specify. For example, if you specify p=(1,2,3), d=(1), q=(12) and no intercept, the model label is ARIMA p=(1,2,3) d=(1) q=(12) NOINT. For monthly data, this is equivalent to the model ARIMA(3,1,0)(0,0,1)s NOINT as specified in the ARIMA Model Specification window or the Custom Model Specification window. ARIMA Options
Specifies the ARIMA model in terms of the autoregressive lags (p), differencing lags (d), and moving-average lags (q). Autoregressive
defines the autoregressive part of the model. Select the Set button to open the AR Polynomial
Forecast Combination Model Specification Window F 2819
Specification window, where you can add any set of autoregressive lags grouped into any number of factors. Differencing
specifies differencing to be applied to the input data. Select the Set button to open the Differencing Specification window, where you can specify any set of differncing lags. Moving Average
defines the moving-average part of the model. Select the Set button to open the MA Polynomial Specification window, where you can add any set of moving-average lags grouped into any number of factors. Estimation Method
specifies the method used to estimate the model parameters. The Conditional Least Squares and Unconditional Least Squares methods generally require fewer computing resources and are more likely to succeed in fitting complex models. The Maximum Likelihood method requires more resources but provides a better fit in some cases. See also Estimation Details in Chapter 7, “The ARIMA Procedure.”
Forecast Combination Model Specification Window Use the Forecast Combination Model Specification window to produce forecasts by averaging the forecasts of two or more forecasting models. The specified combination of models is added to the model list for the series. Access this window from the Develop Models window whenever two or more models have been fit to the current series. It is invoked by selecting Combine Forecasts from the Fit Model submenu of the Edit menu, or from the pop-up menu which appears when you click an empty part of the model table.
2820 F Chapter 45: Window Reference
Controls and Fields Series
is the name and variable label of the current series. Model
is a descriptive label for the model that you specify. You can type a label in this field or allow the system to provide a label. If you leave the label blank, a label is generated automatically based on the options you specify. Weight
is a column of the forecasting model table that contains the weight values for each model. The forecasts for the combined model are computed as a weighted average of the predictions from the models in the table that use these weights. Models with missing weight values are not included in the forecast combination. You can type weight values in these fields or you can use other features of the window to set the weights. Model Description
is a column of the forecasting model table that contains the descriptive labels of the forecasting models fit to the current series that are available for combination. Root Mean Square Error (or other statistic name) button
is the button above the right side of the table. It displays the name of the current model selection criterion: a statistic that measures how well each model in the table fits the values of the current series for observations within the evaluation range. Clicking this button opens the Model Selection Criterion window to enable you to select a different statistic. Normalize Weights button
replaces each nonmissing value in the Weights column with the current value divided by the sum of the weights. The resulting weights are proportional to original weights and sum to 1.
Forecasting Project File Selection Window F 2821
Fit Regression Weights button
computes weight values for the models in the table by regressing the series on the predictions from the models. The values in the Weights column are replaced by the estimated coefficients produced by this linear regression. If some weight values are nonmissing and some are missing, only models with nonmissing weight values are included in the regression. If all weights are missing, all models are used. OK
closes the Forecast Combination Model Specification window and fits the model. Cancel
closes the Forecast Combination Model Specification window without fitting the model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the Forecast Combination Model Specification window. This might be useful when editing an existing model specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values.
Mouse Button Actions You can select or deselect models for inclusion in the combination model by positioning the mouse cursor over the model description and pressing the left mouse button. When you select a model in this way, the weights are automatically updated. The newly selected model is given a weight equal to the average weight of the previously selected models, and all the nonmissing weights are normalized to sum to 1. When you use the mouse to remove a model from the combination, the weight of the deselected model is set to missing and the remaining nonmissing weights are normalized to sum to 1.
Forecasting Project File Selection Window Use the Forecasting Project File Selection window to locate and load a previously stored forecasting project. Access it from the project Browse button of the Manage Forecasting Project window or the Time Series Forecasting window or from the Open Project item under the File menu of the Develop Models window.
2822 F Chapter 45: Window Reference
Selection Lists Libraries
is a list of currently assigned libraries. When you select a library from this list, the catalogs in that library are shown in the catalog selection list. Catalogs
is a list of catalogs contained in the currently selected library. When you select a catalog from this list, any forecasting project entries stored in that catalog are shown in the projects selection list. Projects
is a list of forecasting project entries contained in the currently selected catalog.
Controls and Fields OK
closes the window and opens the selected project. Cancel
closes the window without selecting a project. Delete
deletes the selected project file. Reset
restores selections to those which were set before the window was opened.
Intervention Specification Window F 2823
Forecast Options Window Use the Forecast Options window to set options to control how forecasts and confidence limits are computed. It is available from the Forecast Options item in the Options menu of the Develop Models window, Automatic Model Fitting window, Produce Forecasts, and Manage Projects windows.
Controls and Fields Confidence Limits
specifies the size of the confidence limits for the forecast values. For example, a value of 0.95 specifies 95% confidence intervals. You can type in a number or select from the pop-up list. Predictions for transformed models
controls how forecast values are computed for models that employ a series transformation. See the section Predictions for Transformed Models in Chapter 46, “Forecasting Process Details,” for more information. The values are as follows. Mean
specifies that forecast values be predictions of the conditional mean of the series. Median
specifies that forecast values be predictions of the conditional median of the series. OK
closes the window and saves the option settings you specified. Cancel
closes the window without changing the forecast options. Any options you specified are lost.
Intervention Specification Window Use the Intervention Specification window to specify intervention effects to model the impact on the series of unusual events. Access it from the Intervention for Series window. For more information,
2824 F Chapter 45: Window Reference
see the section “Interventions” on page 2755.
Controls and Fields Series
is the name and variable label of the current series. Label
is a descriptive label for the intervention effect that you specify. You can type a label in this field or allow the system to provide the label. If you leave the label blank, a label is generated automatically based on the options you specify. Date
is the date that the intervention occurs. You can type a date value in this field, or you can set the date by selecting a row of the data table on the right side of the window. Type of Intervention
Point
specifies that the intervention variable is zero except for the specified date. Step
specifies that the intervention variable is zero before the specified date and a constant 1.0 after the date. Ramp
specifies that the intervention variable is an increasing linear function of time after the date of the intervention and zero before the intervention date.
Interventions for Series Window F 2825
Number of lags
specifies the numerator order for the transfer function model for the intervention effect. Select a value from the pop-up list. Effect Decay Pattern
specifies the denominator order for the transfer function model for the intervention effect. The value “Exp” specifies a single lag denominator factor; the value “Wave” specifies a two-lag denominator factor. OK
closes the window and adds the intervention effect specified to the series interventions list. Cancel
closes the window without adding the intervention. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the window. This might be useful when editing an intervention specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values.
Interventions for Series Window Use the Interventions for Series window to create and edit a list of intervention effects to model the impact on the series of unusual events and to select intervention effects as predictors for forecasting models. Access it from the Add button pop-up menu of the ARIMA Model Specification or Custom Model Specification window, or by selecting Define Interventions from the Tools in the Develop Models window. For more information, see the section “Interventions” on page 2755.
2826 F Chapter 45: Window Reference
Controls and Fields Series
is the name and variable label of the current series. OK
closes the window. If you access this window from the ARIMA Model Specification window or the Custom Model Specification window, any interventions that are selected (highlighted) in the list are added to the model. If you access this window from the Tools menu, all interventions in the list are saved for the current series. Cancel
closes the window without returning a selection or changing the interventions list. Any options you specified are lost. Reset
resets the list as it was on entry to the window. Clear
deletes all interventions from the list. Add
opens the Intervention Specification window to specify a new intervention effect and add it to the list. Delete
deletes the currently selected (highlighted) entries from the list. Edit
opens the Intervention Specification window to edit the currently selected (highlighted) intervention.
Manage Forecasting Project Window F 2827
Mouse Button Actions To select or deselect interventions, position the mouse cursor over the intervention’s label in the Interventions list and press the left mouse button. When you position the mouse cursor in the Interventions list and press the right mouse button, a menu containing the actions Add, Delete, and Edit appears. These actions are the same as the Add, Delete, and Edit buttons. Double-clicking on an intervention in the list invokes an Edit action for that intervention specification.
Manage Forecasting Project Window Use this resizable window to work with collections of series, models, and options called projects. The window contains a project name, a description field, and a table of information about all the series for which you have fit forecasting models. Access it by using the Manage Projects button on the Time Series Forecasting window.
2828 F Chapter 45: Window Reference
Controls and Fields Project Name
is the name of the SAS catalog entry in which forecasting models and other results will be stored and from which previously stored results are loaded into the forecasting system. You can specify the project by typing a SAS catalog entry name in this field or by selecting the Browse button to the right of this field. If you specify the name of an existing catalog entry, the information in the project file is loaded. If you specify a one-level name, it is assumed to be the name of a project in the “fmsproj” catalog in the “sasuser” library. For example, typing samproj is equivalent to typing sasuser.fmsproj.samproj. project Browse button opens the Forecasting Project File Selection window to enable you to select and load the project from a list of previously stored project files. Description
is a descriptive label for the forecasting project. The description you type in this field will be stored with the catalog entry shown in the Project field if you save the project.
Series List Table The table of series for which forecasting models have been fit contains the following columns. Series Name
is the name of the time series variable represented in the given row of the table. Series Frequency
is the time interval (data frequency) for the time series. Input Data Set Name
is the input data set that provided the data for the series. Forecasting Model
is the descriptive label for the forecasting model selected for the series. Statistic Name
is the statistic of fit for the forecasting model selected for the series. Number of Models
is the total number of forecasting models fit to the series. If there is more than one model for a series, use the Model List window to see a list of models. Series Label
is the variable label for the series. Time ID Variable Name
is the time ID variable for the input data set for the series. Series Data Range
is the time range of the nonmissing values of the series. Model Fit Range
is the period of fit used for the series.
Manage Forecasting Project Window F 2829
Model Evaluation Range
is the evaluation period used for the series. Forecast Range
is the forecast period set for the series.
Menu Bar File
New
opens a dialog which lets you create a new project, assign it a name and description, and make it the active project. Open
opens a dialog that lets you select and load a previously saved project. Close
closes the Manage Forecasting Project window and returns to the main window. Save
saves the current state of the system (including all the models fit to a series) to the current project catalog entry. Save As
saves the current state of the system with a prompt for the name of the catalog entry in which to store the information. Save to Data Set
saves the current project file information in a SAS data set. The contents of the data set are the same as the information displayed in the series list table. Delete
deletes the current project file. Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print
prints the current project file information. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup.
2830 F Chapter 45: Window Reference
Edit
Delete Series
deletes all models for the selected (highlighted) row of the table and removes the series from the project. Clear
resets the system, deleting all series and models from the project. Reset
restores the Manage Forecasting Project window to its initial state. View
Data Set
opens a Viewtable window to display the input data set for the selected (highlighted) series. Series
opens the Time Series Viewer window to display plots of the selected (highlighted) series. Model
opens the Model Viewer window to show the current forecasting model for the selected series. Forecast
opens the Model Viewer to display plots of the forecasts produced by the forecasting model for the selected (highlighted) series. Tools
Diagnose Series
opens the Series Diagnostics window to perform the automatic series diagnostic process to determine the kinds of forecasting models appropriate for the selected (highlighted) series. List Models
opens the Model List window for the selected (highlighted) series, which displays a list of all the models that you fit for the series. This action is the same as double-clicking the mouse on the table row. Generate Data
opens the Time Series Simulation window. This window enables you to simulate ARIMA time series processes and is useful for educational exercises or testing the system. Refit Models
All Series
refits all the models for all the series in the project by using data within the current fit range. Selected Series
refits all the models for the currently highlighted series by using data within the current fit range.
Manage Forecasting Project Window F 2831
Reevaluate Models
All Series
reevaluates all the models for all the series in the project by using data within the current evaluation fit range. Selected Series
reevaluates all the models for the currently highlighted series by using data within the current evaluation range. Options
Time Ranges
opens the Time Ranges Specification window to enable you to change the fit and evaluation time ranges and the forecast horizon. Default Time Ranges
opens the Default Time Ranges window to enable you to control how the system sets the time ranges for series when you do not explicitly set time ranges with the Time Ranges Specification window. Settings made by using this window do not affect series you are already working with; they take effect when you select a new series. Model Selection List
opens the Model Selection List editor window. Use this to edit the set of forecasting models considered by the automatic model selection process and displayed by the Models to Fit window. Statistics of Fit
opens the Statistics of Fit Selection window, which controls which of the available statistics will be displayed. Forecast Options
opens the Forecast Options window, which enables you to control the widths of forecast confidence limits and control the kind of predicted values computed for models that include series transformations. Column Labels
enables you to set long or short column labels. Long labels are used by default. Include Interventions
controls whether intervention effects defined for the current series are automatically added as predictors to the models considered by the automatic selection process and displayed by the Model Selection List editor window. When the Include Interventions option is selected, the series interventions are also automatically added to the predictors list when you specify a model in the ARIMA and Custom Models Specification windows. Print Audit Trail
prints to the SAS log information about the models fit by the system. A check mark or filled check box next to this item indicates that the audit option is turned on. Show Source Statements
controls whether SAS statements submitted by the forecasting system are printed in
2832 F Chapter 45: Window Reference
the SAS log. When the Show Source Statements option is selected, the system sets the SAS system option SOURCE before submitting SAS statements; otherwise, the system uses the NOSOURCE option. Note that only some of the functions performed by the forecasting system are accomplished by submitting SAS statements. A check mark or filled check box next to this item indicates that the option is turned on.
Left Mouse Button Actions If you select a series in the table by positioning the cursor over the table row and clicking with the left mouse button once, that row of the table is highlighted. Menu bar actions such as Delete Series will apply to the highlighted row of the table. If you select a series in the table by positioning the cursor over the table row and double-clicking with the left mouse button, the system opens the Model List window for that series, which displays a list of all the models that you fit for the series. This is the same as the List Models action under Tools in the menu bar.
Right Mouse Button Actions Clicking the right mouse button invokes a pop-up menu of actions applicable to the highlighted series. The actions in this menu are as follows. Delete Series
deletes the highlighted series and its models from the project. This is the same as Delete Series in the Edit menu. Refit All Models
refits all models attached to the highlighted series by using data within the current fit range. This is the same as the Selected Series item under Refit Models in the Tools menu. Reevaluate All Models
reevaluates all models attached to the highlighted series by using data within the current evaluation range. This is the same as the Selected Series item under Reevaluate Models in the Tools menu. List Models
invokes the Model List window. This is the same as List Models under the Tools menu. View Series
opens the Time Series Viewer window to display plots of the highlighted series. This is the same as the Series item under the View menu. View Forecasting Model
invokes the Model Viewer window to display the forecasting model for the highlighted series. This is the same as the Model item under the View menu. View Forecast
opens the Model Viewer window to display the forecasts for the highlighted series. This is the same as the Forecast item under the View menu.
Model Fit Comparison Window F 2833
Refresh
updates information shown in the Manage Forecasting Project window.
Model Fit Comparison Window Use the Model Fit Comparison window to compare goodness-of-fit statistics for any two models fit to the current series. Access it from the Tools menu of the Develop Models window and the Automatic Model Fitting Results window whenever two or more models have been fit to the series.
Controls and Fields Series
identifies the current time series variable. Range
displays the starting and ending dates of the series data range. Model 1
shows the model currently identified as Model 1. Model 1 upward arrow button
enables you to change the model identified as Model 1 if it is not already the first model in the list of models associated with the series. Select this button to cycle upward through the list of models.
2834 F Chapter 45: Window Reference
Model 1 downward arrow button
enables you to change the model identified as Model 1 if it is not already the last model in the list of models. Select this button to cycle downward through the list of models. Model 2
shows the model currently identified as Model 2. Model 2 upward arrow button
enables you to change the model identified as Model 2 if it is not already the first model in the list of models associated with the series. Select this button to cycle upward through the list of models. Model 2 downward arrow button
enables you to change the model identified as Model 2 if it is not already the last model in the list of models. Select this button to cycle downward through the list of models. Close
closes the Model Fit Comparison window. Save
opens a dialog for specifying the name and label of a SAS data set to which the statistics will be saved. The data set will contain all available statistics and their values for Model 1 and Model 2, as well as a flag variable that is set to 1 for those statistics that were displayed. Print
prints the contents of the table to the SAS Output window. If you find that the contents do not appear immediately in the Output window, you need to set scrolling options. Select “Preferences” under the Options submenu of the Tools menu. In the Preferences window, select the Advanced tab, then set output scroll lines to a number greater than zero. If you want to route the contents to a printer, go to the Output window and select “Print” from the File menu. Statistics
opens the Statistics of Fit Selection window for controlling which statistics are displayed.
Model List Window This resizable window shows all of the models that have been fit to a particular series in a project. Access it from the Manage Forecasting Project window by selecting a series in the series list table and choosing “List Models” from the Tools menu or by double-clicking the series.
Model List Window F 2835
Controls and Fields Data Set
is the name of the current input data set. Interval
is the time interval (data frequency) for the input data set. Series
is the variable name and label of the current time series. Data Range
is the date of the first and last nonmissing data values available for the current series in the input data set. Fit Range
is the current period of fit setting. This is the range of data that will be used to fit models to the series. It might be different from the fit ranges shown in the table, which were in effect when the models were fit. Evaluation Range
is the current period of evaluation setting. This is the range of data that will be used to calculate the goodness-of-fit statistics for models fit to the series. It might be different from the evaluation ranges shown in the table, which were in effect when the models were fit. View Series Graphically icon
opens the Time Series Viewer window to display plots of the current series.
2836 F Chapter 45: Window Reference
View Model Graphically icon
opens the Model Viewer to display graphs and tables for the currently highlighted model.
Model List Table The table of models fit to the series contains columns that show the model label, the fit range and evaluation range used to fit the models, and all of the currently selected fit statistics. You can change the selection of fit statistics by using the Statistics of Fit Selection window. Click on column headings to sort the table by a particular column. If a model is highlighted, clicking with the right mouse button invokes a pop-up menu that provides actions applicable to the highlighted model. It includes the following items. View Model
opens the Model Viewer on the selected model. This is the same as “Model Predictions” under the View menu. View Parameter Estimates
opens the Model Viewer to display the parameter estimates table for the currently highlighted model. This is the same as “Parameter Estimates” under the View menu. View Statistics of Fit
opens the Model Viewer to display the statistics of fit table for the currently highlighted model. This is the same as “Statistics of FIt” under the View menu. Edit Model
opens the appropriate model specification window for changing the attributes of the highlighted model and fitting the modified model. Refit Model
refits the highlighted model using the current fit range. Reevaluate Model
reevaluates the highlighted model using the current evaluation range. Delete Model
deletes the highlighted model from the project. View Forecasts
opens the Model Viewer to show the forecasts for the highlighted model. This is the same as “Forecast Graph” under the View menu.
Menu Bar File
Save
opens a dialog which lets you save the contents of the table to a specified SAS data set. Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you
Model List Window F 2837
can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print
sends the contents of the table to a printer as defined through Print Setup. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Close
closes the window and returns to the Manage Forecasting Projects window. Edit
Edit Model
enables you to modify the specification of the currently highlighted model in the table and fit the modified model. The new model replaces the current model in the table. Refit Model
refits the currently highlighted model using data within the current fit range. Reevaluate Model
recomputes statistics of fit for the currently highlighted model using data within the current evaluation range. Delete Model
deletes the currently highlighted model from the model table. Reset
restores the contents of the Model List window to the state initially displayed. View
Series
opens the Time Series Viewer window to display plots of the current series. This is the same as the View Series Graphically icon. Model Predictions
opens the Model Viewer to display a predicted and actual plot for the currently highlighted model. This is the same as the View Model Graphically icon. Prediction Errors
opens the Model Viewer to display the prediction errors for the currently highlighted model. Prediction Error Autocorrelations
opens the Model Viewer to display the prediction error autocorrelations, partial autocorrelations, and inverse autocorrelations for the currently highlighted model.
2838 F Chapter 45: Window Reference
Prediction Error Tests
opens the Model Viewer to display graphs of white noise and stationarity tests on the prediction errors of the currently highlighted model. Parameter Estimates
opens the Model Viewer to display the parameter estimates table for the currently highlighted model. Statistics of Fit
opens the Model Viewer window to display goodness-of-fit statistics for the currently highlighted model. Forecast Graph
opens the Model Viewer to graph the forecasts for the currently highlighted model. Forecast Table
opens the Model Viewer to display forecasts for the currently highlighted model in a table. Options
Statistics of Fit
opens the Statistics of Fit Selection window, which presents a list of statistics that the system can display. Use this action to customize the list of statistics shown in the Model Viewer, Automatic Model Fitting Results, and Model Fit Comparison windows and available for selection in the Model Selection Criterion menu. Column Labels
enables you to set long or short column labels. Long labels are used by default.
Model Selection Criterion Window Use the Model Selection Criterion window to select the model selection criterion statistic used by the automatic selection process to determine the best fitting forecasting model. Model selection criterion statistics are a subset of those shown in the Statistics of Fit Selection window, since some statistics of fit, such as number of observations, are not useful for model selection. This window is available from the Model Selection Criterion item of the Options menu of the Develop Models window, Automatic Model Fitting window, and Produce Forecasts window.
Model Selection List Editor Window F 2839
Controls and Fields Show subset
when selected, lists only those model selection criterion statistics that are selected in the Statistics of Fit Selection window. Show all
when selected, lists all available model selection criterion statistics. OK
closes the window and sets the model selection criterion to the statistic you specified. Cancel
closes the window without changing the model selection criterion.
Model Selection List Editor Window Use the Model Selection List Editor window to edit the model selection list, including adding your own custom models, and to specify which models in the list are to be used in the automatic fitting process. Access it from the Options menu in the Develop Models, Automatic Model Fitting window, Produce Forecasts, and Manage Projects windows. The window initially displays the current model list for your project. You can modify this set of models in several ways: Open one or more alternate model lists to replace or append to the current model list. These can be either model lists included with the software or model lists previously saved by you or other users.
2840 F Chapter 45: Window Reference
Turn the autofit option on or off for individual models. Those that are not flagged for autofit will be available by using the Models to Fit window but not by using automatic model fitting. Delete models from the list that are not needed for your project. Reorder the models in the list. Edit models in the list. Create a new empty list. Add new models to the list. Having modified the current model list, you can save it for future use in several ways: Save it in a catalog so it can be opened later in the Model Selection List Editor. Save it as the user default to be used automatically when new projects are created. Select close to close the Model Selection List Editor and attach the modified model selection list to the current project. Select cancel to close the Model Selection List Editor without changing the current project’s model selection list. Since model selection lists are not bound to specific data sources, care must be taken when including data-specific features such as interventions and regressors. When you add an ARIMA, Factored ARIMA, or Custom model to the list, you can add regressors by selecting from the variables in the current data set. If there is no current data set, you will be prompted to specify a data set so you can select regressors from the series it contains. If you use a model list that has models with a particular regressor name on a data set that does not contain a series of that name, model fitting will fail. However, you can make global changes to the regressor names in the model list by using Set regressor names. For example, you might use the list of dynamic regression models found in the sashelp.forcast catalog. It uses the regressor name “price.” If your regessor series is named “x,” you can specify “price” as the current regressor name and “x” as the “change to” name. The change will be applied to all models in the list that contain the specified regressor name. Interventions cannot be defined for models defined from the Model Selection List Editor. However, you can define interventions by using the Intervention Specification Window and apply them to your models by turning on the Include Interventions option.
Model Selection List Editor Window F 2841
Auto Fit The auto fit column of check boxes enables you to eliminate some of the models from being used in the automatic fitting process without having to delete them from the list. By default, all models are checked, meaning that they are all used for automatic fitting.
Model This column displays the descriptions of all models in the model selection list. You can select one or more models by clicking them. Selected models are highlighted and become the object of the actions Edit, Move, and Delete.
Menu Bar File
New
creates a new empty model selection list. Open
opens a dialog for selecting one or more existing model selection lists to open. If you select multiple lists, they are all opened at once as a concatenated list. This helps you build large specialized model lists quickly by mixing and matching various existing lists
2842 F Chapter 45: Window Reference
such as the various ARIMA model lists included in SASHELP.FORCAST. By default, the lists you open replace the current model list. Select the “append” radio button if you want to append them to the current model list. Open System Default
opens the default model list supplied with the product. Cancel
exits the window without applying any changes to the current project’s model selection list. Close
closes the window and applies any changes made to the project’s model selection list. Save
opens a dialog for saving the edited model selection list in a catalog of your choice. Save as User Default
saves your edited model list as a default list for new projects. The location of this saved list is shown on the message line. When you create new projects, the system searches for this model list and uses it if it is found. If it is not found, the system uses the original default model list supplied with the product. Edit
Reset
restores the list to its initial state when the window was invoked. Add Model
enables you to add new models to the selection list. You can use the Smoothing Model Specification window, the ARIMA Model Specification window, the Factored ARIMA Model Specification window, or the Custom Model Specification window. Edit Selected
opens the appropriate model specification window for changing the attributes of the highlighted model and adding the modified model to the selection list. The original model is not deleted. Move Selected
enables you to reorder the models in the list. Select one or more models, then select Move Selected from the menu or toolbar. A note appears on the message line: “Select the row after which the selected models are to be moved.” Then select any unhighlighted row in the table. The selected models will be moved after this row. Delete
deletes any highlighted models from the list. This item is not available if no models are selected. Set Regressor Names
opens a dialog for changing all occurrences of a given regressor name in the models of the current model selection list to a name that you specify. Select All
selects all models in the list.
Model Viewer Window F 2843
Clear Selections
deselects all models in the list. Select All for Autofit
checks the autofit check boxes of all models in the list. Clear Autofit Selections
deselects the autofit check boxes of all models in the list.
Mouse Button Actions Clicking any model description in the table selects (highlights) that model. Clicking the same model again deselects it. Multiple selections are allowed. Clicking the auto fit check box in any row toggles the associated model’s eligibility for use in automatic model fitting. Right-clicking the right mouse button opens a pop-up menu.
Model Viewer Window This resizable window provides plots and tables of actual values, model predictions, forecasts, and related statistics. The various plots and tables available are referred to as views. The section “View Selection Icons” on page 2845 explains how to change the view.
2844 F Chapter 45: Window Reference
You can access Model Viewer in a number of ways, including the View Model Graphically icon of the Develop Models and Model List windows, the Graph button of the Automatic Model Fitting Results window, and the Model item under the View menu in the Manage Forecasting Project window. In addition, you can go directly to a selected view in the Model Viewer window by selecting Model Predictions, Prediction Errors, Statistics of Fit, Prediction Error Autocorrelations, Prediction Error Tests, Parameter Estimates, Forecast Graph, or Forecast Table from the View menu or corresponding toolbar icon or pop-up menu item in the Develop Models, Model List, or Automatic Model Fitting Results windows. The state of the Model Viewer window is controlled by the current model and the currently selected view. You can resize this window, and you can use other windows without closing the Model Viewer window. By default, the Model Viewer window is automatically updated to display the new model when you switch to working with another model (that is, when you highlight a different model). You can unlink the Model Viewer window from the current model selection by selecting the Link/Unlink icon from the window’s horizontal toolbar. See “Link/Unlink” in the section “Toolbar Icons” on page 2844. For more information, see the section “Model Viewer” on page 2655.
Toolbar Icons The Model Viewer window contains a horizontal row of icons called the Toolbar. Corresponding menu items appear under various menus. The function of each icon is explained in the following list.
Model Viewer Window F 2845
Zoom in
In the Model Predictions, Prediction Errors, and Forecast Graph views, the Zoom In action changes the mouse cursor into cross hairs that you can use with the left mouse button to define a region of the graph to zoom in on. In the Prediction Error Autocorrelations and Prediction Error Tests views, Zoom In reduces the number of lags displayed. Zoom out
reverses the previous Zoom In action. Link/Unlink viewer
disconnects or connects the Model Viewer window to the model table (Develop Models window, Model List window, or Automatic Model Fitting Results window). When the viewer is linked, selecting another model in the model table causes the model viewer to be updated to show the selected model. When the Viewer is unlinked, selecting another model does not affect the viewer. This feature is useful for comparing two or more models graphically. You can display a model of interest in the Model Viewer, unlink it, then select another model and open another Model Viewer window for that model. Position the viewer windows side by side for convenient comparisons of models, or use the Next Viewer icon or F12 function key to switch between them. Save
saves the contents of the Model Viewer window. By default, an HTML page is created. This enables you to display graphs and tables by using the Results Viewer or publish them on the Web or your intranet. See also “Save Graph As” and “Save Data As” under “Menu Bar” below. Print
prints the contents of the viewer window. Close
closes the Model Viewer window and returns to the window from which it was invoked.
View Selection Icons At the right hand side of the Model Viewer window is a vertical toolbar to select the view—that is, the kind of plot or table that the viewer displays. Corresponding menu items appear under View in the menu bar. The function of each icon is explained in the following list. Model Predictions
displays a plot of actual series values and model predictions over time. Click individual points in the graph to get a display of the type (actual or predicted), ID value, and data value in the upper right corner of the window. Prediction Errors
displays a plot of model prediction errors (residuals) over time. Click individual points in the graph to get a display of the prediction error value in the upper right corner of the window. Prediction Error Autocorrelations
displays horizontal bar charts of the sample autocorrelation, partial autocorrelation, and inverse autocorrelation functions for the model prediction errors. Overlaid line plots represent confidence limits computed at plus and minus two standard errors. Click any of the bars to display its value.
2846 F Chapter 45: Window Reference
Prediction Error Tests
displays horizontal bar charts that represent results of white noise and stationarity tests on the model prediction errors. The first bar chart shows the significance probability of the Ljung-Box chi-square statistic computed on autocorrelations up to the given lag. Longer bars favor rejection of the null hypothesis that the series is white noise. Click any of the bars to display an interpretation. The second bar chart shows tests of stationarity of the model prediction errors, where longer bars favor the conclusion that the series is stationary. Each bar displays the significance probability of the augmented Dickey-Fuller unit root test to the given autoregressive lag. Long bars represent higher levels of significance against the null hypothesis that the series contains a unit root. For seasonal data, a third bar chart appears for seasonal root tests. Click on any of the bars to display an interpretation. Parameter Estimates
displays a table showing model parameter estimates along with standard errors and t tests for the null hypothesis that the parameter is zero. Statistics of Fit
displays a table of statistics of fit for the selected model. The set of statistics shown can be changed by using the Statistics of Fit item under Options in the menu bar. Forecast Graph
displays a plot of actual and predicted values for the series data range, followed by a horizontal reference line and forecasted values with confidence limits. Click individual points in the graph to get a display of the type, date/time, and value of the data point in the upper right corner of the window. Forecast Table
displays a data table with columns containing the date/time, actual, predicted, error (residual), lower confidence limit, and upper confidence limit values, together with any predictor series.
Menu Bar File
Save Graph
saves the plot displayed in viewer window as a SAS/GRAPH grseg catalog entry. When the current view is a table, this menu item is not available. See also “Save” in the section “Toolbar Icons” on page 2844. If a graphics catalog entry name has not already been specified, this action functions like “Save Graph As.” Save Graph As
saves the current graph as a SAS/GRAPH grseg catalog entry in a SAS catalog that you specify and/or as an Output Delivery System (ODS) object. By default, an HTML page is created, with the graph embedded as a gif image. Save Data
saves the data displayed in the viewer window in a SAS data set, where applicable.
Model Viewer Window F 2847
Save Data As
saves the data in a SAS data set that you specify and/or as an Output Delivery System (ODS) object. By default, an HTML page is created, with the data displayed as a table. Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Graph
prints the contents of the viewer window if the current view is a graph. This is the same as the Print toolbar icon. If the current view is a table, this menu item is not available. Print Data
prints the data displayed in the viewer window, where applicable. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Print Preview
opens a preview window to show how your plots will appear when printed. Close
closes the Model Viewer window and returns to the window from which it was invoked. Edit
Edit Model
enables you to modify the specification of the current model and to fit the modified model, which is then displayed in the viewer. Refit Model
refits the current model by using data within the current fit range. This action also causes the ranges to be reset if the data range has changed. Reevaluate Model
reevaluates the current model by using data within the current evaluation range. This action also causes the ranges to be reset if the data range has changed. View
See “View Selection Icons” on page 2845. It describes each of the items available under “View,” except “Zoom Way Out.” Zoom Way Out
zooms the plot out as far as it will go, undoing all prior zoom in operations.
2848 F Chapter 45: Window Reference
Tools
Link Viewer
See “Link/Unlink” in the section “Toolbar Icons” on page 2844. Options
Time Ranges
opens the Time Ranges Specification window to enable you to change the period of fit, period of evaluation, or forecast horizon to be applied to subsequently fit models. Statistics of Fit
opens the Statistics of Fit Selection window, which presents a list of statistics that the system can display. Use this action to customize the list of statistics shown in the statistics of fit table and available for selection in the Model Selection Criterion menu. Forecast Options
opens the Forecast Options window, which enables you to control the widths of forecast confidence limits and control the kind of predicted values computed for models that include series transformations. Residual Plot Options
Provides a choice of four methods of computing prediction errors for models which include a data transformation. Prediction Errors
computes the difference between the transformed series actual values and model predictions. Normalized Prediction Errors
computes prediction errors in normalized form. Model Residuals
computes the difference between the untransformed series values and the untransformed model predictions. Normalized Model Residuals
computes model residuals in normalized form. Number of Lags
opens a window to enable you to specify the number of lags shown in the Prediction Error Autocorrelations and Prediction Error Tests views. You can also use the Zoom In and Zoom Out actions to control the number of lags displayed. Correlation Probabilities
controls whether the bar charts in the Prediction Error Autocorrelations view represent significance probabilities or values of the correlation coefficient. A check mark or filled check box next to this item indicates that significance probabilities are displayed. In each case the bar graph horizontal axis label changes accordingly.
Models to Fit Window F 2849
Include Interventions
controls whether intervention effects defined for the current series are automatically added as predictors to the models considered by the automatic selection process. A check mark or filled check box next to this item indicates that the option is turned on. Print Audit Trail
prints to the SAS log information about the models fit by the system. A check mark or filled check box next to this item indicates that the audit option is turned on. Show Source Statements
controls whether SAS statements submitted by the forecasting system are printed in the SAS log. When the Show Source Statements option is selected, the system sets the SAS system option SOURCE before submitting SAS statements; otherwise, the system uses the NOSOURCE option. Note that only some of the functions performed by the forecasting system are accomplished by submitting SAS statements. A check mark or filled check box next to this item indicates that the option is turned on.
Mouse Button Actions You can examine the data values of individual points in the Model Predictions, Model Prediction Errors, and Forecast Graph views of the Model Viewer by clicking the point. The date/time and data values as well as the type (actual, predicted, and so forth) are displayed in a box that appears in the upper right corner of the Viewer window. Click the mouse elsewhere or select any action to dismiss the data box. Similarly, you can display values in the Prediction Error Autocorrelations view by clicking any of the bars. Clicking bars in the Prediction Error Tests view displays a Recommendation for Current View window which explains the test represented by the bar. When you select the Zoom In action in the Predicted Values, Model Prediction Errors, and Forecasted Values views, you can use the mouse to define a region of the graph to zoom. Position the mouse cursor at one corner of the region, press the left mouse button, and move the mouse cursor to the opposite corner of the region while holding the left mouse button down. When you release the mouse button, the plot is redrawn to show an expanded view of the data within the region you selected.
Models to Fit Window Use the Models to Fit window to fit models by choosing them from the current model selection list. Access it by using “Fit Models from List” under the Fit Model submenu of the Edit menu in the Develop Models window, or the pop-up menu that appears when you click an empty area of the model table in the Develop Models window. If you want to alter the list of models that appears here, use the Model Selection List editor window.
2850 F Chapter 45: Window Reference
To select a model to be fit, use the left mouse button. To select more than one model to fit, drag with the mouse, or select the first model, then press the shift key while selecting the last model. For noncontiguous selections, press the control key while selecting with the mouse. To begin fitting the models, double-click the last selection or select the OK button. If series diagnostics have been performed, the radio box is available. If the Subset by series diagnostics radio button is selected, only those models in the selection list that fit the diagnostic criteria will be shown for selection. If you want to choose models that do not fit the diagnostic criteria, select the Show all models button.
Controls and Fields Show all models
when selected, lists all available models, regardless of the setting of the series diagnostics options. Subset by series diagnostics
when selected, lists only the available models that are consistent with the series diagnostics options. OK
closes the Models to Fit window and fits the selected models. Cancel
closes the window without fitting any models. Any selections you made are lost.
Polynomial Specification Window F 2851
Polynomial Specification Window Use the Polynomial Specification window to add a polynomial to an ARIMA model. The set of lags defined here become a polynomial factor, denoted by a list of lags in parentheses, when you select “OK.” If you accessed this window from the AR Polynomial Specification window, then it is added to the autoregressive part of the model. If you accessed it from the MA Polynomial Specification window, it is added to the moving-average part of the model.
Controls and Fields Lag
specifies a lag value to add to the list. Type in a positive integer or select one by clicking the spin box arrows. Add
adds the value in the Lag spin box to the list of polynomial lags. Duplicate values are not allowed. Remove
deletes a selected lag from the list of polynomial lags. Polynomial Lags
is a list of unique integers that represent lags to be added to the model. OK
closes the window and returns the specified polynomial to the AR or MA polynomial specification window. Cancel
closes the window and discards any polynomial lags added to the list.
2852 F Chapter 45: Window Reference
Produce Forecasts Window Use the Produce Forecasts window to produce forecasts for the series in the current input data set for which you have fit forecasting models. Access it by using the Produce Forecasts button of the Time Series Forecasting window.
Controls and Fields Input Data Set is the name of the current input data set. To specify the input data set, you can type a one-level or two-level SAS data set name in this field or select the Browse button to the right of the field. Input data set Browse button opens the Data Set Selection window to enable you to select the input data set. Time ID
is the name of the time ID variable for the input data set. To specify this variable, you can type the ID variable name in this field or use the Select button. Time ID Select button opens the Time ID Variable Specification window. Create button
opens a menu of choices of methods for creating a time ID variable for the input data set. Use this feature if the input data set does not already contain a valid time ID variable.
Produce Forecasts Window F 2853
Interval
is the time interval between observations (data frequency) in the current input data set. If the interval is not automatically filled in by the system, you can type in an interval name here, or select one from the pop-up list. Series
indicates the number and names of time series variables for which forecasts will be produced. Series Select button opens the Series to Process window to let you select the series for which you want to produce forecasts. Forecast Output Data Set is the name of the output data set that will contain the forecasts. Type the name of the output data set in this field or click the Browse button. Forecast Output Browse button opens a dialog to let you locate an existing data set to which to save the forecasts. Format
enables you to select one of three formats for the forecast data set: Simple
specifies the simple format for the output data set. The data set contains the time ID variable and the forecast variables and contains one observation per time period. Observations for earlier time periods contain actual values copied from the input data set; later observations contain the forecasts. Interleaved
specifies the interleaved format for the output data set. The data set contains the time ID variable, the variable TYPE, and the forecast variables. There are several observations per time period, with the meaning of each observation identified by the TYPE variable. Concatenated
specifies the concatenated format for the output data set. The data set contains the variable SERIES, the time ID variable, and the variables ACTUAL, PREDICT, ERROR, LOWER, and UPPER. There is one observation per time period per forecast series. The variable SERIES contains the name of the forecast series, and the data set is sorted by SERIES and DATE. Horizon
is the number of periods or years to forecast beyond the end of the input data range. To specify the forecast horizon, you can type a value in this field or select one from the pop-up list. Horizon periods
selects the units to apply to the horizon. By default, the horizon value represents number of periods. For example, if the interval is month, the horizon represents the number of months to forecast. Depending on the interval, you can also select weeks or years, so that the horizon is measured in those units. Horizon date
is the ending date of the forecast horizon. You can type in a date that uses a form recognized by a SAS date informat, or you can increment or decrement the date shown by using the left and right arrows. The outer arrows change the date by a larger amount than the inner arrows.
2854 F Chapter 45: Window Reference
The date field and the horizon field reset each other, so you can use either one to specify the forecasting horizon. Run button
produces forecasts for the selected series and stores the forecasts in the specified output SAS data set. Output button
opens a Viewtable window to display the output data set. This button becomes available once the forecasts have been written to the data set. Close button
closes the Produce Forecasts window and returns to the Time Series Forecasting window.
Menu Bar File
Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup. Close
closes the Produce Forecasts window and returns to the Time Series Forecasting window. View
Input Data Set
opens a Viewtable window to browse the current input data set. Output Data Set
opens a Viewtable window to browse the output data set. This is the same as the Output button. Tools
Produce Forecasts
produces forecasts for the selected series and stores the forecasts in the specified output SAS data set. This is the same as the Run button.
Produce Forecasts Window F 2855
Options
Default Time Ranges
opens the Default Time Ranges window to enable you to control how the system sets the time ranges when new series are selected. Model Selection List
opens the Model Selection List editor window. Use this to edit the set of forecasting models considered by the automatic model selection process and displayed by the Models to Fit window. Model Selection Criterion
opens the Model Selection Criterion window, which presents a list of goodness-of-fit statistics and enables you to select the fit statistic that is displayed in the table and used by the automatic model selection process to determine the best fitting model. Statistics of Fit
opens the Statistics of Fit Selection window, which presents a list of statistics that the system can display. Use this action to customize the list of statistics shown in the Statistics of Fit table and available for selection in the Model Selection Criterion window. Forecast Options
opens the Forecast Options window, which enables you to control the widths of forecast confidence limits and control the kind of predicted values computed for models that include series transformations. Forecast Data Set
enables you to select one of three formats for the forecast data set. See Format, which is described previously in this section. Alignment of Dates
Beginning
aligns dates that the system generates to identify forecast observations in output data sets to the beginning of the time intervals. Middle
aligns dates that the system generates to identify forecast observations in output data sets to the midpoints of the time intervals. End
aligns dates that the system generates to identify forecast observations in output data sets to the end of the time intervals. Automatic Fit
opens the Automatic Model Selection Options window, which enables you to control the number of models retained by the automatic model selection process and whether the models considered for automatic selection are subset according to the series diagnostics.
2856 F Chapter 45: Window Reference
Set Toolbar Type
Image Only
displays the toolbar items as icons without text. Label Only
displays the toolbar items as text without icon images. Both
displays the toolbar items as both text and icon images. Include Interventions
controls whether intervention effects defined for the current series are automatically added as predictors to the models considered by the automatic selection process. A check mark or filled check box next to this item indicates that the option is turned on. Print Audit Trail
prints to the SAS log information about the models fit by the system. A check mark or filled check box next to this item indicates that the audit option is turned on. Show Source Statements
controls whether SAS statements submitted by the forecasting system are printed in the SAS log. When the Show Source Statements option is selected, the system sets the SAS system option SOURCE before submitting SAS statements; otherwise, the system uses the NOSOURCE option. Note that only some of the functions performed by the forecasting system are accomplished by submitting SAS statements. A check mark or filled check box next to this item indicates that the option is turned on.
Regressors Selection Window Use the Regressors Selection window to select one or more time series variables in the input data set to include as regressors in the forecasting model to predict the dependent series. Access it from the pop-up menu that appears when you select the Add button of the ARIMA Model Specification window or Custom Model Specification window.
Save Data As F 2857
Controls and Fields Dependent
is the name and variable label of the current series. Regressors
is a table listing the names and labels of the variables in the input data set available for selection as regressors. The variables that you select are highlighted. Selecting a highlighted row again deselects that variable. OK
closes the Regressors Selection window and adds the selected variables as regressors in the model. Cancel
closes the window without adding any regressors. Any selections you made are lost. Reset
resets all options to their initial values upon entry to the window.
Save Data As Use Save Data As from the Time Series Viewer Window or the Model Viewer Window to save data displayed in a table to a SAS data set or external file. Use Save Forecast As from the Develop Models Window to save forecasts and related data including the series name, model, and interval. It supports append mode, enabling you to accumulate the forecasts of multiple series in a single data set.
2858 F Chapter 45: Window Reference
To save your data in a SAS data set, provide a library name or assign one by using the Browse button, then provide a data set name or accept the default. Enter a descriptive label for the data set in the Label field. Click OK to save the data set. If you specify an existing data set, it will be overwritten, except in the case of Save Forecast As. External file output takes advantage of the Output Delivery System (ODS) and is designed primarily for creating HTML tables for Web reporting. You can build a set of Web pages quickly and use the ODS Results window to view and organize them. To use this feature, check Save External File in the External File Output box. To set ODS options, click Results Preferences, then select the Results tab in the Preferences dialog. If you have previously saved data of the current type, the system remembers your previous labels and titles. To reuse them, click the arrow button to the right of each of these window fields. Use the Customize button if you need to specify the name of a custom macro that contains ODS statements. The default macro simply runs the PRINT procedure. A custom macro can be used to add PRINT procedure and/or ODS statements to customize the type and organization of output files produced.
Save Graph As F 2859
Save Graph As Use Save Graph As from the Time Series Viewer Window or the Model Viewer Window to save any of the graphs in a catalog or external file.
To save your graph as a grseg catalog entry, enter a two level name for the catalog or select Browse to open an Open dialog. Use it to select an existing library or assign a new one and then select a catalog to contain the graph. Click the Open button to open the catalog and close the dialog. Then enter a graphics entry name (eight characters or less) and a label or accept the defaults and click the OK button to save the graph. External file output takes advantage of the Output Delivery System (ODS) and is designed primarily for creating gif images and HTML for Web reporting. You can build a set of Web pages that contain graphs and use the Results window to view and organize them. To use this feature, check Save External File in the External File Output box. To set ODS options, click Results Preferences, then select the Results tab in the Preferences dialog. If you have previously saved graphs of the current type, the system remembers your previous labels and titles. To reuse them, click the arrow button to the right of each of these window fields. Use the Customize button if you need to specify the name of a custom macro that contains ODS statements. The default macro simply runs the GREPLAY procedure. Users familiar with ODS
2860 F Chapter 45: Window Reference
might want to add statements to the macro to customize the type and organization of output files produced.
Seasonal ARIMA Model Options Window Use the Seasonal ARIMA Model Options window to specify the autoregressive, differencing, and moving-average orders for the seasonal part of a model defined by using the Custom Model Specification window. Access it by selecting “Seasonal ARIMA. . . ” from the Seasonal Model combo box of that window.
Controls and Fields ARIMA Options
Use these combo boxes to specify the orders of the ARIMA model. You can either type in a value or click the combo box arrow to select from a pop-up list. Autoregressive
defines the order of the seasonal autoregressive part of the model. Differencing
defines the order of seasonal differencing. Moving Average
defines the order of the seasonal moving-average term. OK
closes the Seasonal ARIMA Model Options window and returns to the Custom Model Specification window.
Series Diagnostics Window F 2861
Cancel
closes the Seasonal ARIMA Model Options window and returns to the Custom Model Specification window, discarding any changes made. Reset
resets all options to their initial values upon entry to the window.
Series Diagnostics Window Use the Series Diagnostics window to set options to limit the kinds of forecasting models considered for the series according to series characteristics. Access it by selecting “Diagnose Series” from the Tools menu in the Develop Models, Manage Project, and Time Series Viewer window menu bars. You can let the system diagnose the series characteristics automatically or you can specify series characteristics according to your judgment by using the radio buttons.
For each of the options Log Transform, Trend, and Seasonality, the value “Yes” means that only models appropriate for series with that characteristic should be considered. The value “No” means that only models appropriate for series without that characteristic should be considered. The value “Maybe” means that models should be considered without regard for that characteristic.
Controls and Fields Series
is the name and variable label of the current series.
2862 F Chapter 45: Window Reference
Series Characteristics
Log Transform
specifies whether forecasting models with or without a logarithmic series transformation are appropriate for the series. Trend
specifies whether forecasting models with or without a trend component are appropriate for the series. Seasonality
specifies whether forecasting models with or without a seasonal component are appropriate for the series. Automatic Series Diagnostics
performs the automatic series diagnostic process. The options Log Transform, Trend, and Seasonality are set according to the results of statistical tests. OK
closes the Series Diagnostics window. Cancel
closes the Series Diagnostics window without changing the series diagnostics options. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the Series Diagnostics window. Clear
resets all options to their default values.
Series Selection Window Use this resizable window to select a time series variable by specifying a library, a SAS data set or view, and a variable. These selections can be made by typing, by selecting from lists, or by a combination of the two. In addition, you can control the time ID variable and time interval, and you can browse the data set or view plots of the series from this window.
Series Selection Window F 2863
This window appears automatically when you select the View Series Graphically or Develop Models buttons in the Time Series Forecasting window and no series has been selected, and when you open the Time Series Viewer as a standalone tool. It is also invoked by using the Browse button in the Develop Models window. The system requires that series names be unique for each frequency (interval) within the forecasting project. If you select a series from the current input data set that already exists in the project with the same interval but a different input data set name, the system warns you and gives you the option to cancel the selection, to refit all models associated with the series by using the data from the current input data set, to delete the models for the series, or to inherit the existing models.
Controls and Fields Library
is a SAS libname assigned within the current SAS session. If you know the libname associated with the data set of interest, you can type it in this field and press Return. If it is a valid choice, it will appear in the libraries list and will be highlighted. The SAS Data Sets list will be populated with data sets associated with that libname. Data Set
is the name of a SAS data set (data file or data view) that resides under the selected libname. If you know the name, you can type it in and press Return. If it is a valid choice, it will appear in the SAS Data Sets list and will be highlighted, and the Time Series Variables list will be populated with the numeric variables in the data set. Variable
is the name of a numeric variable contained in the selected data set. You can type the variable name in this field or you can select the variable with the mouse from the Time Series Variables list.
2864 F Chapter 45: Window Reference
Time ID
is the name of the ID variable for the input data set. To specify the ID variable, you can type the ID variable name in this field or click the Select button. Select button
opens the Time ID Variable Specification window to let you select an existing variable in the data set as the Time ID. Create button
opens a menu of methods for creating a time ID variable for the input data set. Use this feature if the data set does not already contain a valid time ID variable. Interval
is the time interval between observations (data frequency) in the selected data set. If the interval is not automatically filled in by the system, you can type in an interval name or select one from the pop-up list. For more information about intervals, see Chapter 4, “Date Intervals, Formats, and Functions,” in this book. OK
This button is present when you have selected “Develop Models” from the Time Series Forecasting window. It closes the Series Selection window and makes the selected series the current series. Close
If you have selected the View Series Graphically icon from the Time Series Forecasting window, this button returns you to that window. If you have selected a series, it remains selected as the current series. If you are using the Time Series Viewer as a standalone application, this button closes the application. Cancel
This button is present when you have selected “Develop Models” from the Time Series Forecasting window. It closes the Series Selection window without applying any selections made. Reset
resets the fields to their initial values at entry to the window. Table
opens a Viewtable window for browsing the selected data set. This can assist you in locating the variable containing data you are looking for. Graph
opens the Time Series Viewer window to display the selected time series variable. You can switch to a different series in the Series Selection window without closing the Time Series Viewer window. Position the windows so they are both visible, or use the Next Viewer toolbar icon or F12 function key to switch between windows. Refresh
updates all fields and lists on the window. If you assign a new libname without exiting the Series Selection window, use the refresh action to update the Libraries list so that it will include the newly assigned libname. Also use the Refresh action to update the variables list if the input data set is changed.
Series to Process Window F 2865
Selection Lists Libraries
displays a list of currently assigned libnames. You can select a libname by clicking it, which is equivalent to typing its name in the Library field. If you cannot locate the library or directory you are interested in, go to the SAS Explorer window, select “New” from the File menu, then select “Library” and “OK.” This opens the New Library dialog window. You also assign a libname by submitting a libname statement from the Editor window. Select the Refresh button to make the new libname available in the libraries list. SAS Data Sets
displays a list of the SAS data sets (data files or data views) located under the selected libname. You can select one of these by clicking it, which is equivalent to typing its name in the Data Set field. Time Series Variables
displays a list of numeric variables contained within the selected data set. You can select one of these by clicking it, which is equivalent to typing its name in the Variable field. You can double-click a series to select it and exit the window.
Series to Process Window Use the Series to Process window to select series for model fitting or forecasting. Access it by using the Select button in the Automatic Model Fitting and Produce Forecasts windows. Hold down the shift key or drag with the left mouse button for multiple selections. Use the control key for noncontiguous multiple selections. Once you make selections and select OK, the number of selected series and their names are listed in the Series to Process field of the calling window (with ellipses if not all the names will fit). When invoked from Automatic Model Fitting, the Series to Process window shows all the numeric variables in the input data set except the time ID variable. These are the series which are currently available for model fitting. When invoked from Produce Forecasts, the Series to Process window shows all the series in the input data set for which models have been fit. These are the series which are currently available for forecasting.
2866 F Chapter 45: Window Reference
Controls and Fields OK
closes the window and applies the series selection(s) which have been made. At least one series must be selected. Cancel
closes the window, ignoring series selections which have been made, if any. Clear
deselects all series in the selection list. All
selects all series in the selection list.
Series Viewer Transformations Window Use the Series Viewer Transformations window to view plots of transformations of the current series in the Time Series Viewer window. It provides a larger set of transformations than those available from the viewer window’s toolbar. It is invoked by using “Other Transformations” under the Tools menu of the Time Series Viewer window. The options that you specify in this window are applied to the series displayed in the Time Series Viewer window when you select “OK” or “Apply.” Use the Apply button if you want to make repeated transformations to a series without having to close and reopen the Series Viewer Transformations window each time.
Series Viewer Transformations Window F 2867
Controls and Fields Series
is the variable name for the current time series. Transformation
is the transformation applied to the time series displayed in the Time Series Viewer window. Select Log, Logistic, Square Root, Box-Cox, or none from the pop-up list. Simple Differencing
is the order of differencing applied to the time series displayed in the Time Series Viewer window. Select a number from 0 to 5 from the pop-up list. Seasonal Differencing
is the order of seasonal differencing applied to the time series displayed in the Time Series Viewer window. Select a number from 0 to 3 from the pop-up list. Percent Change
is a check box that if selected displays the series in terms of percent change from the previous period. Additive Decomposition
is a check box that produces a display of a selected series component derived by using additive decomposition. Multiplicative Decomposition
is a check box that produces a display of a selected series component derived using multiplicative decomposition. Component
selects a series component to display when either additive or multiplicative decomposition is
2868 F Chapter 45: Window Reference
turned on. You can display the seasonally adjusted component, the trend-cycle component, the seasonal component, or the irregular component—that is, the residual that remains after removal of the other components. The heading in the viewer window shows which component is currently displayed. OK
applies the transformation options you selected to the series displayed in the Time Series Viewer window and closes the Series Viewer Transformations window. Cancel
closes the Series Viewer Transformations window without changing the series displayed by the Time Series Viewer window. Apply
applies the transformation options you selected to the series displayed in the Time Series Viewer window without closing the Series Viewer Transformations window. Reset
resets the transformation options to their initial values upon entry to the Series Viewer Transformations window. Clear
resets the transformation options to their default values (no transformations).
Smoothing Model Specification Window Use the Smoothing Model Specification window to specify and fit exponential smoothing and Winters method models. Access it from the Develop Models window by using the Fit Model submenu of the Edit menu or from the pop-up menu when you click an empty area of the model table.
Smoothing Model Specification Window F 2869
Controls and Fields Series
is the name and variable label of the current series. Model
is a descriptive label for the model that you specify. You can type a label in this field or allow the system to provide a label. If you leave the label blank, a label is generated automatically based on the options you specify. Smoothing Methods
Simple Smoothing
specifies simple (single) exponential smoothing. Double (Brown) Smoothing
specifies double exponential smoothing by using Brown’s one parameter model (single exponential smoothing applied twice). Seasonal Smoothing
specifies seasonal exponential smoothing. (This is like Winters method with the trend term omitted.) Linear (Holt) Smoothing
specifies exponential smoothing of both the series level and trend (Holt’s two parameter model). Damped-Trend Smoothing
specifies exponential smoothing of both the series level and trend with a trend damping weight. Winters Method - Additive
specifies Winters method with additive seasonal factors. Winters Method - Multiplicative
specifies Winters method with multiplicative seasonal factors. Smoothing Weights
displays the values used for the smoothing weights. By default, the Smoothing Weights fields are set to “optimize,” which means that the system will compute the weight values that best fit the data. You can also type smoothing weight values in these fields. Level
is the smoothing weight used for the level of the series. Trend
is the smoothing weight used for the trend of the series. Damping
is the smoothing weight used by the damped-trend method to damp the forecasted trend towards zero as the forecast horizon increases. Season
is the smoothing weight used for the seasonal factors in Winters method and seasonal exponential smoothing.
2870 F Chapter 45: Window Reference
Transformation
displays the series transformation specified for the model. When a transformation is specified, the model is fit to the transformed series, and forecasts are produced by applying the inverse transformation to the model predictions. Select Log, Logistic, Square Root, Box-Cox, or None from the pop-up list. Bounds
displays the constraints imposed on the fitted smoothing weights. Select one of the following from the pop-up list: Zero-One/Additive
sets the smoothing weight optimization region to the intersection of the region bounded by the intervals from zero (0.001) to one (0.999) and the additive invertible region. This is the default. Zero-One Boundaries
sets the smoothing weight optimization region to the region bounded by the intervals from zero (0.001) to one (0.999). Additive Invertible
sets the smoothing weight optimization region to the additive invertible region. Unrestricted
sets the smoothing weight optimization region to be unbounded. Custom
opens the Smoothing Weights window to enable you to customize the constraints for smoothing weights optimization. OK
closes the Smoothing Model Specification window and fits the model you specified. Cancel
closes the Smoothing Model Specification window without fitting the model. Any options you specified are lost. Reset
resets all options to their initial values upon entry to the window. This might be useful when editing an existing model specification; otherwise, Reset has the same function as Clear. Clear
resets all options to their default values.
Smoothing Weight Optimization Window Use the Smoothing Weight Optimization window to specify constraints for the automatic fitting of smoothing weights for exponential smoothing and Winters method models. Access it from the Smoothing Models Specification window when you select “Custom” in the “Bounds” combo box.
Smoothing Weight Optimization Window F 2871
Controls and Fields No restrictions
when selected, specifies unrestricted smoothing weights. Bounded region
when selected, restricts the fitted smoothing weights to be within the bounds that you specify with the “Smoothing Weight Bounded Region” options. Additive invertible region
when selected, restricts the fitted smoothing weights to be within the additive invertible region of the parameter space of the ARIMA model equivalent to the smoothing model. (See the section “Smoothing Models” on page 2897 for details.) Additive invertible and bounded region
when selected, restricts the fitted smoothing weights to be both within the additive invertible region and within bounds that you specify. Smoothing Weight Bounded Region
is a group of numeric entry fields that enable you to specify lower and upper limits on the fitted value of each smoothing weight. The fields that appear in this part of the window depend on the kind of smoothing model that you specified. OK
closes the window and sets the options that you specified. Cancel
closes the window without changing any options. Any values you specified are lost. Reset
resets all options to their initial values upon entry to the window. Clear
resets all options to their default values.
2872 F Chapter 45: Window Reference
Statistics of Fit Selection Window Use the Statistics of Fit Selection window to specify which of the available goodness-of-fit statistics are reported for models you fit and are available for selection as the model selection criterion used by the automatic selection process. This window is available under the Options menu in the Develop Models, Automatic Model Fitting, Produce Forecasts, and Model List windows, and from the Statistics button of the Model Fit Comparison window and Automatic Model Fitting results windows.
Controls and Fields Select Statistics Table
list the available statistics. Select a row of the table to select or deselect the statistic shown in that row. OK
closes the window and applies the selections made. Cancel
closes the window without applying any selections. Clear
deselects all statistics. All
selects all statistics.
Time ID Creation from Several Variables Window F 2873
Time ID Creation – 1,2,3 Window Use the Time ID Creation – 1,2,3 window to add a time ID variable to an input data set with observation numbers as the ID values. The interval for the series will be 1. Use this approach if the data frequency does not match any of the system’s date or date-time intervals, or if other methods of assigning a time ID do not work. To access this window, select “Create from observation numbers” from the Create pop-up list in any window where you can select a Time ID variable. For more information, see Chapter 4, “Date Intervals, Formats, and Functions,” in this book.
Controls and Fields Data set name
is the name of the input data set. New ID variable name
is the name of the time ID variable to be created. You can type any valid SAS variable name in this field. OK
closes the window and proceeds to the next step in the time ID creation process. Cancel
closes the window without creating a Time ID variable. Any options you specified are lost.
Time ID Creation from Several Variables Window Use the Time ID Creation from Several Variables window to add a SAS date valued time ID variable to an input data set when the input data set already contains several dating variables, such as day, month, and year. To access this window, select “Create from existing variables” from the Create pop-up list in any window where you can select a Time ID variable. For more information, see Chapter 40, “Creating Time ID Variables.”
2874 F Chapter 45: Window Reference
Controls and Fields Variables
is a list of variables in the input data set. Select existing ID variables from this list. Date Part
is a list of date parts that you can specify for the selected ID variable. For each ID variable that you select from the Variables list, select the Date Part value that describes what the values of the ID variable represent. arrow button
moves the selected existing ID variable and date part specification to the “Existing Time IDs” list. Once you have done this, you can select another ID variable from the Variables list. New variable
is the name of the time ID variable to be created. You can type any valid SAS variable name in this field. New interval
is the time interval between observations in the input data set implied by the date part ID variables you have selected. OK
closes the window and proceeds to the next step in the time ID creation process. Cancel
closes the window without creating a time ID. Any options you specified are lost. Reset
resets the options to their initial values upon entry to the window.
Time ID Creation from Starting Date Window F 2875
Time ID Creation from Starting Date Window Use the Time ID Creation from Starting Date window to add a SAS date valued time ID variable to an input data set. This is a convenient way to add a time ID of any interval as long as you know the starting date of the series. To access this window, select “Create from starting date and frequency” from the Create pop-up list in any window where you can select a Time ID variable. For more information, see Chapter 40, “Creating Time ID Variables.”
Controls and Fields Data set name
is the name of the input data set. Starting Date
is the starting date for the time series in the data set. Enter a date value in this field, using a form recognizable by a SAS date informat, for example, 1998:1, feb1997, or 03mar1998. Interval
is the time interval between observations in the data set. Select an interval from the pop-up list. New ID variable name
is the name of the time ID variable to be created. You can type any valid SAS variable name in this field. OK
closes the window and proceeds to the next step in the time ID creation process. Cancel
closes the window without changing the input data set. Any options you specified are lost.
2876 F Chapter 45: Window Reference
Time ID Creation Using Informat Window Use the Time ID Creation using Informat window to add a SAS date valued time ID variable to an input data set. Use this window if your data set contains a date variable that is stored as a character string. Using the appropriate SAS date informat, the date string is read in and used to create a date or date-time variable. To access this window, select “Create from existing variable/informat” from the Create pop-up list in any window where you can select a Time ID variable.
Controls and Fields Variable Name
is the name of an existing ID variable in the input data set. Click the Select button to select a variable. Select button
opens a list of variables in the input data set for you to select from. Informat
is a SAS date or datetime informat for reading date or datetime value from the values of the specified existing ID variable. You can type in an informat or select one from the pop-up list. First Obs
is the value of the variable you selected from the first observation in the data set, displayed here for convenience. Date Value
is the SAS date or datetime value read from the first observation value that uses the informat that you specified.
Time ID Variable Specification Window F 2877
New ID variable name
is the name of the time ID variable to be created. You can type any valid SAS variable name in this field. OK
closes the window and proceeds to the next step in the time ID creation process. Cancel
closes the window without changing the input data set. Any options you specified are lost. Reset
resets the options to their initial values upon entry to the window.
Time ID Variable Specification Window Use the Time ID Variable Specification window to specify a variable in the input data set that contains the SAS date or datetime value of each observation. You do not need to use this window if your time ID variable is named date, time, or datetime, since these are picked up automatically. Invoke the window from the Select button to the right of the Time ID field in the Data Set Selection, Automatic Model Fitting, Produce Forecasts, Series Selection, and Time Series Forecasting windows.
Controls and Fields Data Set
is the name of the current input data set. Time ID
is the name of the currently selected Time ID variable, if any.
2878 F Chapter 45: Window Reference
Interval
is the time interval between observations (data frequency) in the input data set. Select a Time ID Variable
is a selection list of variables in the input set. Select one variable to assign it as the Time ID variable. OK
closes the window and retains the selection made, if it is a valid time ID. Cancel
closes the window and ignores any selection made. Reset
restores the time ID variable to the one assigned when the window was initially opened, if any.
Time Ranges Specification Window Use the Time Ranges Specification window to control the period of fit and evaluation and the forecasting horizon. Invoke this window from the Options menu in the Develop Models, Manage Forecasting Project, and Model Viewer windows or the Set Ranges button in the Develop Models window.
Time Ranges Specification Window F 2879
Controls and Fields Data Set
is the name of the current input data set. Interval
is the time interval (data frequency) for the input data set. Series
is the variable name and label of the current time series. Data Range
gives the date of the first and last nonmissing data values available for the current series in the input data set. Period of Fit
gives the starting and ending dates of the period of fit. This is the time range used for estimating model parameters. By default, it is the same as the data range. You can type dates in these fields, or you can use the arrow buttons to the left and right of the date fields to decrement or increment the date values shown. Date values must be entered in a form recognized by a SAS date informat. (See SAS Language Reference: Concepts for information about SAS date informats.) The inner arrows increment by periods, the outer arrows increment by larger amounts, depending on the data interval. Period of Evaluation
gives the starting and ending dates of the period of evaluation. This is the time range used for evaluating models in terms of statistics of fit. By default, it is the same as the data range. You can type dates in these fields, or you can use the control arrows to the left and right of the date fields to decrement or increment the date values shown. Date values must be entered in a form recognized by a SAS date informat. (See SAS Language Reference: Concepts for information about SAS date informats.) The inner arrows increment by periods, the outer arrows increment by larger amounts, depending on the data interval. Forecast Horizon
is the forecasting horizon expressed as a number of forecast periods or number of years (or number of weeks for daily data). You can type a number or select one from the pop-up list. The ending date for the forecast period is automatically updated when you change the number of forecasts periods. Forecast Horizon - Units
indicates whether the Forecast Horizon value represents periods or years (or weeks for daily data). Forecast Horizon Date Value
is the date of the last forecast observation. You can type a date in this field, or you can use the arrow buttons to the left and right of the date field to decrement or increment the date values shown. Date values must be entered in a form recognized by a SAS date informat. (See SAS Language Reference: Concepts for information about SAS date informats.) The Forecast Horizon is automatically updated when you change the ending date for the forecast period. Hold-out Sample
specifies that a number of observations or years (or weeks) of data at the end of the data range are used for the period of evaluation with the remainder of data used as the period of fit. You can type a number in this field or select one from the pop-up list. When the hold-out sample
2880 F Chapter 45: Window Reference
value is changed, the Period of Fit and Period of Evaluation ranges are changed to reflect the hold-out sample specification. Hold-out Sample - Units
indicates whether the hold-out sample field represents periods or years (or weeks for daily data). OK
closes the window and stores the specified changes. Cancel
closes the window without saving changes. Any options you specified are lost. Reset
resets the options to their initial values upon entry to the window. Clear
resets all options to their default values.
Time Series Forecasting Window The Time Series Forecasting window is the main application window that appears when you invoke the Time Series Forecasting System. It enables you to specify a project file and an input data set and provides access to the other windows described in this chapter.
Time Series Forecasting Window F 2881
Controls and Fields Project
is the name of the SAS catalog entry in which forecasting models and other results will be stored and from which previously stored results are loaded into the forecasting system. You can specify the project by typing a SAS catalog entry name in this field or by selecting the Browse button to right of this field. If you specify the name of an existing catalog entry, the information in the project file is loaded. If you specify a one-level name, the catalog name is assumed to be fmsproj and the library is assumed to be sasuser. For example, samproj is equivalent to sasuser.fmsproj.samproj. Project Browse button opens the Forecasting Project File Selection window to enable you to select and load the project from a list of previously stored projects. Description
is a descriptive label for the forecasting project. The description you type in this field will be stored with the catalog entry shown in the Project field. Data Set
is the name of the current input data set. To specify the input data set, you can type the data set name in this field or use the Browse button to the right of the field. Data set Browse button opens the Data Set Selection window to enable you to select the input data set. Time ID
is the name of the ID variable for the input data set. To specify the ID variable, you can type the ID variable name in this field or use the Select button. If the time ID variable is named date, time, or datetime, it is automatically picked up by the system. Select button
opens the Time ID Variable Specification window. Create button
opens a menu of choices of methods for creating a time ID variable for the input data set. Use this feature if the input data set does not already contain a valid time ID variable. Interval
is the time interval between observations (data frequency) in the current input data set. If the interval is not automatically filled in, you can type an interval name or select one from the pop-up list. For more information about intervals, see the section “Time Series Data Sets, ID Variables, and Time Intervals” on page 2623. View Series Graphically icon
opens the Time Series Viewer window to display plots of series in the current input data set. View Data as a Table
opens a Viewtable window for browsing the selected input data set. Develop Models
opens the Develop Models window to enable you to fit forecasting models to individual time series and choose the best models to use to produce the final forecasts of each series.
2882 F Chapter 45: Window Reference
Fit Models Automatically
opens the Automatic Model Fitting window for applying the automatic model selection process to all series or to selected series in an input data set. Produce Forecast
opens the Produce Forecasts window for producing forecasts for the series in the current input data set for which you have fit forecasting models. Manage Projects
opens the Manage Forecasting Project window for viewing or editing information stored in projects. Exit
closes the Time Series Forecasting system. Help
accesses the help system.
Time Series Simulation Window Use the Time Series Simulation window to create a data set of simulated series generated by ARIMA processes. Access this window from the Tools menu in the Develop Models and Manage Forecasting Project windows.
Time Series Viewer Window F 2883
Controls and Fields Output Data Set
is the name of the data set to be created. Type in a one-level or two-level SAS data set name. Interval
is the time interval between observations (data frequency) in the simulated data set. Type in an interval name or select one from the pop-up list. Seed
is the seed for the random number generator used to produce the simulated time series. N Observations
is the number of time periods to simulate. Starting Date
is the starting date for the simulated observations. Type in a date in a form recognizable by a SAS data informat, for example, 1998:1, feb1997, or 03mar1998. Ending Date
is the ending date for the simulated observations. Type in a date in a form recognizable by a SAS data informat. Series to Generate
is the list of variable names and ARIMA processes to simulate. Add Series
opens the ARIMA Process Specification window to enable you to add entries to the Series to Generate list. Delete Series
deletes selected (highlighted) entries from the Series to Generate list. OK
closes the Time Series Simulation window and performs the specified simulations and creates the specified data set. Cancel
closes the window without creating a simulated data set. Any options you specified are lost.
Time Series Viewer Window Use the Time Series Viewer window to explore time series data using plots, transformations, statistical tests, and tables. It is available as a standalone application and as part of the Time Series Forecasting System. To use it as a standalone application, select it from the Analysis submenu of the Solutions menu, or use the tsview command (see Chapter 44, “Command Reference,” in this book). To use it within the Time Series Forecasting System, select the View Series Graphically icon in the Time Series Forecasting, Develop Models, or Model List window, or select “Series” from the View menu of the Develop Models, Manage Project, or Model List window. The various plots and tables available are referred to as views. The section “View Selection Icons” on page 2845 explains how to change the view.
2884 F Chapter 45: Window Reference
The state of the Time Series Viewer window is controlled by the current series, the current series transformation specification, and the currently selected view. You can resize this window, and you can use other windows without closing the Time Series Viewer window. You can explore a number of series conveniently by keeping the Series Selection window open. Each time you make a selection, the viewer window is updated to show the selected series. Keep both windows visible, or switch between them by using the Next Viewer toolbar icon or the F12 function key. You can open multiple Time Series Viewer windows. This enables you to “freeze”a plot so you can come back to it later, or compare two plots side by side on your screen. To do this, unlink the viewer by using the Link/Unlink icon on the window’s toolbar or the corresponding item in the Tools menu. While the viewer window remains unlinked, it is not updated when other selections are made in the Series Selection window. Instead, when you select a series and click the Graph button, a new Time Series Viewer window is invoked. You can continue this process to open as many viewer windows as you want. The Next Viewer icon and corresponding F12 function key are useful for navigating between windows when they are not simultaneously visible on your screen. A wide range of series transformations is available. Basic transformations are available from the window’s horizontal toolbar, and others are available by selecting “Other Transformations” from the Tools menu.
Horizontal Tool Bar The Time Series Viewer window contains a horizontal toolbar with the following icons:
Time Series Viewer Window F 2885
Zoom in
changes the mouse cursor into cross hairs that you can use with the left mouse button to drag out a region of the time series plot to zoom in on. In the Autocorrelations view and the White Noise and Stationarity Tests view, Zoom In reduces the number of lags displayed. Zoom out
reverses the previous Zoom In action and expands the time range of the plot to show more of the series. In the Autocorrelations view and the White Noise and Stationarity Tests view, Zoom Out increases the number of lags displayed. Link/Unlink viewer
disconnects or connects the Time Series Viewer window to the window in which the series was selected. When the Viewer is linked, it always shows the current series. If you select another series, linked Viewers are updated. Unlinking a Viewer freezes its current state, and changing the current series has no effect on the Viewer’s display. The View Series action creates a new Series Viewer window if there is no linked Viewer. By using the unlink feature, you can open several Time Series Viewer windows and display several different series simultaneously. Log Transform
applies a log transform to the current view. This can be combined with other transformations; the current transformations are shown in the title. Difference
applies a simple difference to the current view. This can be combined with other transformations; the current transformations are shown in the title. Seasonal Difference
applies a seasonal difference to the current view. For example, if the data are monthly, the seasonal cycle is one year. Each value has subtracted from it the value from one year previous. This can be combined with other transformations; the current transformations are shown in the title. Close
closes the Time Series Viewer window and returns to the window from which it was invoked.
Vertical Toolbar View Selection Icons At the right-hand side of the Time Series Viewer window is a vertical toolbar used to select the kind of plot or table that the Viewer displays. Series
displays a plot of series values over time. Autocorrelations
displays plots of the sample autocorrelations, partial autocorrelation, and inverse autocorrelation functions for the series, with lines overlaid at plus and minus two standard errors. White Noise and Stationarity Tests
displays horizontal bar charts that represent results of white noise and stationarity tests. The first bar chart shows the significance probability of the Ljung-Box chi-square statistic computed on autocorrelations up to the given lag. Longer bars favor rejection of the null hypothesis that the series is white noise. Click any of the bars to display an interpretation.
2886 F Chapter 45: Window Reference
The second bar chart shows tests of stationarity, where longer bars favor the conclusion that the series is stationary. Each bar displays the significance probability of the augmented DickeyFuller unit root test to the given autoregressive lag. Long bars represent higher levels of significance against the null hypothesis that the series contains a unit root. For seasonal data, a third bar chart appears for seasonal root tests. Click any of the bars to display an interpretation. Data Table
displays a data table containing the values in the input data set.
Menu Bar File
Save Graph
saves the current plot as a SAS/GRAPH grseg catalog entry in a default or most recently specified catalog. This item is unavailable in the Data Table view. Save Graph as
saves the current graph as a SAS/GRAPH grseg catalog entry in a SAS catalog that you specify and/or as an Output Delivery System (ODS) object. By default, an HTML page is created, with the graph embedded as a gif image. This item is unavailable in the Data Table view. Save Data
saves the data displayed in the viewer window to an output SAS data set. This item is unavailable in the Series view. Save Data as
saves the data in a SAS data set that you specify and/or as an Output Delivery System (ODS) object. By default, an HTML page is created, with the data displayed as a table. Import Data
is available if you license SAS/Access software. It opens an Import Wizard, which you can use to import your data from an external spreadsheet or data base to a SAS data set for use in the Time Series Forecasting System. Export Data
is available if you license SAS/Access software. It opens an Export Wizard, which you can use to export a SAS data set, such as a forecast data set created with the Time Series Forecasting System, to an external spreadsheet or data base. Print Graph
prints the plot displayed in the viewer window. This item is unavailable in the Data Table view. Print Data
prints the data displayed in the viewer window. This item is unavailable in the Series view. Print Setup
opens the Print Setup window, which allows you to access your operating system print setup.
Time Series Viewer Window F 2887
Print Preview
opens a preview window to show how your plots will look when printed. Close
closes the Time Series Viewer window and returns to the window from which it was invoked. View
Series
displays a plot of series values over time. This is the same as the Series icon in the vertical toolbar. Autocorrelations
displays plots of the sample autocorrelation, partial autocorrelation, and inverse autocorrelation functions for the series. This is the same as the Autocorrelations icon in the vertical toolbar. White Noise and Stationarity Tests
displays horizontal bar charts representing results of white noise and stationarity tests. This is the same as the White Noise and Stationarity Tests icon in the vertical toolbar. Data Table
displays a data table containing the values in the input data set. This is the same as the Data Table icon in the vertical toolbar. Zoom In
zooms the display. This is the same as the Zoom In icon in the window’s horizontal toolbar. Zoom Out
undoes the last zoom in action. This is the same as the Zoom Out icon in the window’s horizontal toolbar. Zoom Way Out
reverses all previous Zoom In actions and expands the time range of the plot to show all of the series, or shows the maximum number of lags in the Autocorrelations View or the White Noise and Stationarity Tests view. Tools
Log Transform
applies a log transformation. This is the same as the Log Transform icon in the window’s horizontal toolbar. Difference
applies simple differencing. This is the same as the Difference icon in the window’s horizontal toolbar. Seasonal Difference
applies seasonal differencing. This is the same as the Seasonal Difference icon in the window’s horizontal toolbar.
2888 F Chapter 45: Window Reference
Other Transformations
opens the Series Viewer Transformations window to enable you to apply a wide range of transformations. Diagnose Series
opens the Series Diagnostics window to determine the kinds of forecasting models appropriate for the current series. Define Interventions
opens the Interventions for Series window to enable you to edit or add intervention effects for use in modeling the current series. Link Viewer
connects or disconnects the Time Series Viewer window to the window from which series are selected. This is the same as the Link item in the window’s horizontal toolbar. Options
Number of Lags
opens a window to let you specify the number of lags shown in the Autocorrelations view and the White Noise and Stationarity Tests view. You can also use the Zoom In and Zoom Out actions to control the number of lags displayed. Correlation Probabilities
controls whether the bar charts in the Autocorrelations view represent significance probabilities or values of the correlation coefficient. A check mark or filled check box next to this item indicates that significance probabilities are displayed. In each case the bar graph horizontal axis label changes accordingly.
Mouse Button Actions You can examine the data value and date of individual points in the Series view by clicking them. The date and value are displayed in a box that appears in the upper right corner of the Viewer window. Click the mouse elsewhere or select any action to dismiss the data box. You can examine the values of the bars and confidence limits at different lags in the Autocorrelations view by clicking individual bars in the vertical bar charts. You can display an interpretation of the tests in the White Noise and Stationarity Tests view by clicking the bars. When you select the Zoom In action, you can use the mouse to define a region of the graph to take a closer look at. Position the mouse cursor at one corner of the region, press the left mouse button, and move the mouse cursor to the opposite corner of the region while holding the left mouse button down. When you release the mouse button, the plot is redrawn to show an expanded view of the data within the region you selected.
Chapter 46
Forecasting Process Details Contents Forecasting Process Summary . . . . . . . Parameter Estimation . . . . . . . . Model Evaluation . . . . . . . . . . Forecasting . . . . . . . . . . . . . . Forecast Combination Models . . . . External or User-Supplied Forecasts . Adjustments . . . . . . . . . . . . . Series Transformations . . . . . . . . Smoothing Models . . . . . . . . . . . . . Smoothing Model Calculations . . . Missing Values . . . . . . . . . . . . Predictions and Prediction Errors . . Smoothing Weights . . . . . . . . . Equations for the Smoothing Models ARIMA Models . . . . . . . . . . . . . . . Notation for ARIMA Models . . . . Predictor Series . . . . . . . . . . . . . . . Time Trend Curves . . . . . . . . . . Intervention Effects . . . . . . . . . Seasonal Dummy Inputs . . . . . . . Series Diagnostic Tests . . . . . . . . . . . Statistics of Fit . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. 2889 . 2890 . 2890 . 2892 . 2894 . 2894 . 2895 . 2895 . . 2897 . . 2897 . 2898 . 2898 . 2899 . 2900 . 2908 . 2908 . 2912 . 2912 . 2913 . 2915 . 2915 . 2916 . 2918
This chapter provides computational details on several aspects of the Time Series Forecasting System.
Forecasting Process Summary This section summarizes the forecasting process.
2890 F Chapter 46: Forecasting Process Details
Parameter Estimation The parameter estimation process for ARIMA and smoothing models is described graphically in Figure 46.1. Figure 46.1 Model Fitting Flow Diagram
The specification of smoothing and ARIMA models is described in Chapter 41, “Specifying Forecasting Models.” Computational details for these kinds of models are provided in the following sections “Smoothing Models” on page 2897 and “ARIMA Models” on page 2908. The results of the parameter estimation process are displayed in the Parameter Estimates table of the Model Viewer windows along with the estimate of the model variance and the final smoothing state.
Model Evaluation The model evaluation process is described graphically in Figure 46.2.
Model Evaluation F 2891
Figure 46.2 Model Evaluation Flow Diagram
Model evaluation is based on the one-step-ahead prediction errors for observations within the period of evaluation. The one-step-ahead predictions are generated from the model specification and
2892 F Chapter 46: Forecasting Process Details
parameter estimates. The predictions are inverse transformed (median or mean) and adjustments are removed. The prediction errors (the difference of the dependent series and the predictions) are used to compute the statistics of fit, which are described in the section “Series Diagnostic Tests” on page 2915. The results generated by the evaluation process are displayed in the Statistics of Fit table of the Model Viewer window.
Forecasting The forecasting generation process is described graphically in Figure 46.3.
Forecasting F 2893
Figure 46.3 Forecasting Flow Diagram
The forecasting process is similar to the model evaluation process described in the preceding section, except that k-step-ahead predictions are made from the end of the data through the specified forecast horizon, and prediction standard errors and confidence limits are calculated. The forecasts and confidence limits are displayed in the Forecast plot or table of the Model Viewer window.
2894 F Chapter 46: Forecasting Process Details
Forecast Combination Models This section discusses the computation of predicted values and confidence limits for forecast combination models. See Chapter 41, “Specifying Forecasting Models,” for information about how to specify forecast combination models and their combining weights. Given the response time series fyt W 1 t ng with previously generated forecasts for the m component models, a combined forecast is created from the component forecasts as follows: P Predictions: yOt D m i D1 wi yOi;t Prediction Errors: eOt D yt yOt where yOi;t are the forecasts of the component models and wi are the combining weights. The estimate of the root mean square prediction error and forecast confidence limits for the combined forecast are computed by assuming independence of the prediction errors of the component forecasts, as follows: qP m 2 2 Standard Errors: O t D O i;t i D1 wi Confidence Limits:
˙O t Z˛=2
where O i;t are the estimated root mean square prediction errors for the component models, ˛ is the confidence limit width, 1 ˛ is the confidence level, and Z˛=2 is the ˛2 quantile of the standard normal distribution. Since, in practice, there might be positive correlation between the prediction errors of the component forecasts, these confidence limits may be too narrow.
External or User-Supplied Forecasts This section discusses the computation of predicted values and confidence limits for external forecast models. Given a response time series yt and external forecast series yOt , the prediction errors are computed as eOt D yt yOt for those t for which both yt and yOt are nonmissing. The mean squared error (MSE) is computed from the prediction errors. The variance of the k-step-ahead prediction errors is set to k times the MSE. From these variances, the standard errors and confidence limits are computed in the usual way. If the supplied predictions contain so many missing values within the time range of the response series that the MSE estimate cannot be computed, the confidence limits, standard errors, and statistics of fit are set to missing.
Adjustments F 2895
Adjustments Adjustment predictors are subtracted from the response time series prior to model parameter estimation, evaluation, and forecasting. After the predictions of the adjusted response time series are obtained from the forecasting model, the adjustments are added back to produce the forecasts. If yt is the response time series and Xi;t , 1 i m are m adjustment predictor series, then the adjusted response series wt is wt D yt
m X
Xi;t
i D1
Parameter estimation for the model is performed by using the adjusted response time series wt . The forecasts wO t of wt are adjusted to obtain the forecasts yOt of yt . yOt D wO t C
m X
Xi;t
i D1
Missing values in an adjustment series are ignored in these computations.
Series Transformations For pure ARIMA models, transforming the response time series can aid in obtaining stationary noise series. For general ARIMA models with inputs, transforming the response time series or one or more of the input time series can provide a better model fit. Similarly, the fit of smoothing models can improve when the response series is transformed. There are four transformations available, for strictly positive series only. Let yt > 0 be the original time series, and let wt be the transformed series. The transformations are defined as follows: Log
is the logarithmic transformation, wt D ln.yt /
Logistic
is the logistic transformation, wt D ln.cyt =.1
cyt //
where the scaling factor c is c D .1
10
6
/10
ceil.log10 .max.yt ///
and ceil.x/ is the smallest integer greater than or equal to x. Square Root
is the square root transformation, p wt D yt
2896 F Chapter 46: Forecasting Process Details
Box Cox
is the Box-Cox transformation, ( yt 1 ; ¤0 wt D ln.yt /; D 0
Parameter estimation is performed by using the transformed series. The transformed model predictions and confidence limits are then obtained from the transformed time series and these parameter estimates. The transformed model predictions wO t are used to obtain either the minimum mean absolute error (MMAE) or minimum mean squared error (MMSE) predictions yOt , depending on the setting of the forecast options. The model is then evaluated based on the residuals of the original time series and these predictions. The transformed model confidence limits are inverse-transformed to obtain the forecast confidence limits.
Predictions for Transformed Models Since the transformations described in the previous section are monotonic, applying the inversetransformation to the transformed model predictions results in the median of the conditional probability density function at each point in time. This is the minimum mean absolute error (MMAE) prediction. If wt D F.yt / is the transform with inverse-transform yt D F median.yOt / D F
1
.E Œwt / D F
1
1 .w /, t
then
.wO t /
The minimum mean squared error (MMSE) predictions are the mean of the conditional probability density function at each point in time. Assuming that the prediction errors are normally distributed with variance t2 , the MMSE predictions for each of the transformations are as follows: Log
is the conditional expectation of inverse-logarithmic transformation, yOt D E e wt D exp wO t C t2 =2
Logistic
is the conditional expectation of inverse-logistic transformation, 1 yOt D E c.1 C exp. wt // where the scaling factor c D .1
e
6 /10 ceil.log10 .max.yt /// .
Square Root
is the conditional expectation of the inverse-square root transformation, yOt D E wt2 D wO t2 C t2
Box Cox
is the conditional expectation of the inverse Box-Cox transformation, i ( h E .wt C 1/1= ; ¤0 yOt D E Œe wt D exp.wO t C 21 t2 /; D 0
The expectations of the inverse logistic and Box-Cox ( ¤0 ) transformations do not generally have explicit solutions and are computed by using numerical integration.
Smoothing Models F 2897
Smoothing Models This section details the computations performed for the exponential smoothing and Winters method forecasting models.
Smoothing Model Calculations The descriptions and properties of various smoothing methods can be found in Gardner (1985), Chatfield (1978), and Bowerman and O’Connell (1979). The following section summarizes the smoothing model computations. Given a time series fYt W 1 t ng, the underlying model assumed by the smoothing models has the following (additive seasonal) form: Yt D t C ˇt t C sp .t / C t where t
represents the time-varying mean term.
ˇt
represents the time-varying slope.
sp .t /
represents the time-varying seasonal contribution for one of the p seasons.
t
are disturbances.
For smoothing models without trend terms, ˇt D 0; and for smoothing models without seasonal terms, sp .t / D 0. Each smoothing model is described in the following sections. At each time t , the smoothing models estimate the time-varying components described above with the smoothing state. After initialization, the smoothing state is updated for each observation using the smoothing equations. The smoothing state at the last nonmissing observation is used for predictions.
Smoothing State and Smoothing Equations Depending on the smoothing model, the smoothing state at time t consists of the following: Lt is a smoothed level that estimates t . Tt is a smoothed trend that estimates ˇt . St
j,
j D 0; : : :; p
1, are seasonal factors that estimate sp .t/.
The smoothing process starts with an initial estimate of the smoothing state, which is subsequently updated for each observation by using the smoothing equations. The smoothing equations determine how the smoothing state changes as time progresses. Knowledge of the smoothing state at time t 1 and that of the time series value at time t uniquely determine
2898 F Chapter 46: Forecasting Process Details
the smoothing state at time t. The smoothing weights determine the contribution of the previous smoothing state to the current smoothing state. The smoothing equations for each smoothing model are listed in the following sections.
Smoothing State Initialization Given a time series fYt W 1 t ng, the smoothing process first computes the smoothing state for time t D 1. However, this computation requires an initial estimate of the smoothing state at time t D 0, even though no data exists at or before time t D 0. An appropriate choice for the initial smoothing state is made by backcasting from time t D n to t D 1 to obtain a prediction at t D 0. The initialization for the backcast is obtained by regression with constant and linear terms and seasonal dummies (additive or multiplicative) as appropriate for the smoothing model. For models with linear or seasonal terms, the estimates obtained by the regression are used for initial smoothed trend and seasonal factors; however, the initial smoothed level for backcasting is always set to the last observation, Yn . The smoothing state at time t D 0 obtained from the backcast is used to initialize the smoothing process from time t D 1 to t D n (Chatfield and Yar 1988). For models with seasonal terms, the smoothing state is normalized so that the seasonal factors St j for j D 0; : : :; p 1 sum to zero for models that assume additive seasonality and average to one for models (such as Winters method) that assume multiplicative seasonality.
Missing Values When a missing value is encountered at time t, the smoothed values are updated using the errorcorrection form of the smoothing equations with the one-step-ahead prediction error, et , set to zero. The missing value is estimated using the one-step-ahead prediction at time t 1, that is YOt 1 .1/ (Aldrin 1989). The error-correction forms of each of the smoothing models are listed in the following sections.
Predictions and Prediction Errors Predictions are made based on the last known smoothing state. Predictions made at time t for k steps ahead are denoted YOt .k/ and the associated prediction errors are denoted et .k/ D Yt Ck YOt .k/. The prediction equation for each smoothing model is listed in the following sections. The one-step-ahead predictions refer to predictions made at time t 1 for one time unit into the future—that is, YOt 1 .1/. The one-step-ahead prediction errors are more simply denoted et D et 1 .1/ D Yt YOt 1 .1/. The one-step-ahead prediction errors are also the model residuals, and the sum of squares of the one-step-ahead prediction errors is the objective function used in smoothing weight optimization.
Smoothing Weights F 2899
The variance of the prediction errors are used to calculate the confidence limits (Sweet 1985, McKenzie 1986, Yar and Chatfield 1990, and Chatfield and Yar 1991). The equations for the variance of the prediction errors for each smoothing model are listed in the following sections. Note: var.t / is estimated by the mean square of the one-step-ahead prediction errors.
Smoothing Weights Depending on the smoothing model, the smoothing weights consist of the following: ˛
is a level smoothing weight.
is a trend smoothing weight.
ı
is a seasonal smoothing weight.
is a trend damping weight.
Larger smoothing weights (less damping) permit the more recent data to have a greater influence on the predictions. Smaller smoothing weights (more damping) give less weight to recent data.
Specifying the Smoothing Weights Typically the smoothing weights are chosen to be from zero to one. (This is intuitive because the weights associated with the past smoothing state and the value of current observation would normally sum to one.) However, each smoothing model (except Winters Method—Multiplicative Version) has an ARIMA equivalent. Weights chosen to be within the ARIMA additive-invertible region will guarantee stable predictions (Archibald 1990 and Gardner 1985). The ARIMA equivalent and the additive-invertible region for each smoothing model are listed in the following sections.
Optimizing the Smoothing Weights Smoothing weights are determined so as to minimize the sum of squared, one-step-ahead prediction errors. The optimization is initialized by choosing from a predetermined grid the initial smoothing weights that result in the smallest sum of squared, one-step-ahead prediction errors. The optimization process is highly dependent on this initialization. It is possible that the optimization process will fail due to the inability to obtain stable initial values for the smoothing weights (Greene 1993 and Judge et al. 1980), and it is possible for the optimization to result in a local minima. The optimization process can result in weights to be chosen outside both the zero-to-one range and the ARIMA additive-invertible region. By restricting weight optimization to additive-invertible region, you can obtain a local minimum with stable predictions. Likewise, weight optimization can be restricted to the zero-to-one range or other ranges. It is also possible to fix certain weights to a specific value and optimize the remaining weights.
2900 F Chapter 46: Forecasting Process Details
Standard Errors
The standard errors associated with the smoothing weights are calculated from the Hessian matrix of the sum of squared, one-step-ahead prediction errors with respect to the smoothing weights used in the optimization process. Weights Near Zero or One
Sometimes the optimization process results in weights near zero or one. For simple or double (Brown) exponential smoothing, a level weight near zero implies that simple differencing of the time series might be appropriate. For linear (Holt) exponential smoothing, a level weight near zero implies that the smoothed trend is constant and that an ARIMA model with deterministic trend might be a more appropriate model. For damped-trend linear exponential smoothing, a damping weight near one implies that linear (Holt) exponential smoothing might be a more appropriate model. For Winters method and seasonal exponential smoothing, a seasonal weight near one implies that a nonseasonal model might be more appropriate and a seasonal weight near zero implies that deterministic seasonal factors might be present.
Equations for the Smoothing Models Simple Exponential Smoothing The model equation for simple exponential smoothing is Yt D t C t The smoothing equation is Lt D ˛Yt C .1
˛/Lt
1
The error-correction form of the smoothing equation is Lt D Lt
1
C ˛et
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt The ARIMA model equivalency to simple exponential smoothing is the ARIMA(0,1,1) model .1
B/Yt D .1
D1
˛
B/t
Equations for the Smoothing Models F 2901
The moving-average form of the equation is Yt D t C
1 X
˛t
j
j D1
For simple exponential smoothing, the additive-invertible region is f0 < ˛ < 2g The variance of the prediction errors is estimated as 3 2 k X1 ˛ 2 5 D var.t /.1 C .k var.et .k// D var.t / 41 C
1/˛ 2 /
j D1
Double (Brown) Exponential Smoothing The model equation for double exponential smoothing is Yt D t C ˇt t C t The smoothing equations are Lt D ˛Yt C .1
˛/Lt
Tt D ˛.Lt
1/
Lt
1
C .1
˛/Tt
1
This method can be equivalently described in terms of two successive applications of simple exponential smoothing: Œ1
D ˛Yt C .1
Œ2
D ˛St C .1
St St
Œ1 1 Œ2 ˛/St 1
˛/St
Œ1
Œ1
Œ2
Œ1
where St are the smoothed values of Yt , and St are the smoothed values of St . The prediction equation then takes the form: YOt .k/ D .2 C ˛k=.1
Œ1
˛//St
.1 C ˛k=.1
The error-correction forms of the smoothing equations are Lt D Lt Tt D Tt
1 1
C Tt
1
C ˛et
2
C ˛ et
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt C ..k
1/ C 1=˛/Tt
Œ2
˛//St
2902 F Chapter 46: Forecasting Process Details
The ARIMA model equivalency to double exponential smoothing is the ARIMA(0,2,2) model, .1
B/2 Yt D .1
D1
B/2 t
˛
The moving-average form of the equation is Yt D t C
1 X
1/˛ 2 /t
.2˛ C .j
j
j D1
For double exponential smoothing, the additive-invertible region is f0 < ˛ < 2g The variance of the prediction errors is estimated as 2 3 k X1 var.et .k// D var.t / 41 C .2˛ C .j 1/˛ 2 /2 5 j D1
Linear (Holt) Exponential Smoothing The model equation for linear exponential smoothing is Yt D t C ˇt t C t The smoothing equations are Lt D ˛Yt C .1
˛/.Lt
Tt D .Lt
1/
Lt
1
C .1
C Tt
/Tt
1/ 1
The error-correction form of the smoothing equations is Lt D Lt Tt D Tt
1 1
C Tt
1
C ˛et
C ˛ et
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt C kTt The ARIMA model equivalency to linear exponential smoothing is the ARIMA(0,2,2) model, .1
B/2 Yt D .1
1 D 2
˛
2 D ˛
1
˛
1 B
2 B 2 /t
Equations for the Smoothing Models F 2903
The moving-average form of the equation is Yt D t C
1 X
.˛ C j˛ /t
j
j D1
For linear exponential smoothing, the additive-invertible region is f0 < ˛ < 2g f0 < < 4=˛
2g
The variance of the prediction errors is estimated as 2 3 k X1 var.et .k// D var.t / 41 C .˛ C j˛ /2 5 j D1
Damped-Trend Linear Exponential Smoothing The model equation for damped-trend linear exponential smoothing is Yt D t C ˇt t C t The smoothing equations are Lt D ˛Yt C .1
˛/.Lt
Tt D .Lt
1/
Lt
1
C .1
C Tt
/Tt
1/ 1
The error-correction form of the smoothing equations is Lt D Lt
1
C Tt
1
C ˛et Tt D Tt
1
C ˛ et
(Note: For missing values, et D 0.) The k-step prediction equation is k X
YOt .k/ D Lt C
i Tt
i D1
The ARIMA model equivalency to damped-trend linear exponential smoothing is the ARIMA(1,1,2) model, .1
B/.1
1 D 1 C 2 D .˛
B/Yt D .1 ˛
1/
˛
1 B
2 B 2 /t
2904 F Chapter 46: Forecasting Process Details
The moving-average form of the equation (assuming jj < 1) is Yt D t C
1 X
.˛ C ˛ . j
1/=.
1//t
j
j D1
For damped-trend linear exponential smoothing, the additive-invertible region is f0 < ˛ < 2g f0 < < 4=˛
2g
The variance of the prediction errors is estimated as 2 k X1 var.et .k// D var.t / 41 C .˛ C ˛ . j
3 1/=.
1//2 5
j D1
Seasonal Exponential Smoothing The model equation for seasonal exponential smoothing is Yt D t C sp .t / C t The smoothing equations are Lt D ˛.Yt
St
C .1
St D ı.Yt
Lt / C .1
p/
˛/Lt
ı/St
1
p
The error-correction form of the smoothing equations is Lt D Lt
1
C ˛et
St D St
p
C ı.1
˛/et
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt C St
pCk
The ARIMA model equivalency to seasonal exponential smoothing is the ARIMA(0,1,p+1)(0,1,0)p model, .1
B p /Yt D .1
B/.1
1 D 1
˛
2 D 1
ı.1
3 D .1
˛/.ı
˛/ 1/
1 B
2 B p
3 B pC1 /t
Equations for the Smoothing Models F 2905
The moving-average form of the equation is Yt D t C
1 X
j t j
j D1
( j
D
˛ ˛ C ı.1
forj modp¤0 forj mod p D 0
˛/
For seasonal exponential smoothing, the additive-invertible region is fmax. p˛; 0/ < ı.1
˛/ < .2
˛/g
The variance of the prediction errors is estimated as 2 3 k X1 25 var.et .k// D var.t / 41 C j j D1
Multiplicative Seasonal Smoothing In order to use the multiplicative version of seasonal smoothing, the time series and all predictions must be strictly positive. The model equation for the multiplicative version of seasonal smoothing is Yt D t sp .t / C t The smoothing equations are Lt D ˛.Yt =St
p/
C .1
St D ı.Yt =Lt / C .1
˛/Lt
ı/St
1
p
The error-correction form of the smoothing equations is Lt D Lt St D St
1
C ˛et =St
p C ı.1
p
˛/et =Lt
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt St
pCk
The multiplicative version of seasonal smoothing does not have an ARIMA equivalent; however, when the seasonal variation is small, the ARIMA additive-invertible region of the additive version of seasonal described in the preceding section can approximate the stability region of the multiplicative version.
2906 F Chapter 46: Forecasting Process Details
The variance of the prediction errors is estimated as 2 1 p X X1 var.et .k// D var.t / 4 . j Cip St Ck =St Ck
3 25 j/
i D0 j D0
where
j
are as described for the additive version of seasonal method, and
j
D 0 for j k.
Winters Method—Additive Version The model equation for the additive version of Winters method is Yt D t C ˇt t C sp .t / C t The smoothing equations are Lt D ˛.Yt
St
p/
C .1
˛/.Lt
Tt D .Lt
Lt
1/
C .1
/Tt
St D ı.Yt
Lt / C .1
ı/St
1
C Tt
1/
1
p
The error-correction form of the smoothing equations is Lt D Lt
1
C Tt
1
Tt D Tt
1
C ˛ et
St D St
p
C ı.1
C ˛et ˛/et
(Note: For missing values, et D 0.) The k-step prediction equation is YOt .k/ D Lt C kTt C St
pCk
The ARIMA model equivalency to the additive version of Winters method is the ARIMA(0,1,p+1)(0,1,0)p model, " # pC1 X p i .1 B/.1 B /Yt D 1 i B t i D1
8 ˆ 1 ˛ ˆ ˆ ˆ < ˛
j D1 2j p j D ˆ 1 ˛ ı.1 ˛/ j D p ˆ ˆ ˆ :.1 ˛/.ı 1/ j DpC1 ˛
1
The moving-average form of the equation is Yt D t C
1 X
j t j
j D1
( j
D
˛ C j˛ ˛ C j˛ C ı.1
˛/;
forj forj
mod p ¤ 0 mod p D 0
Equations for the Smoothing Models F 2907
For the additive version of Winters method (see Archibald 1990), the additive-invertible region is fmax. p˛; 0/ < ı.1 f0 < ˛ < 2
˛
˛/ < .2
ı.1
˛/.1
˛/g cos.#/g
where # is the smallest nonnegative solution to the equations listed in Archibald (1990). The variance of the prediction errors is estimated as 2 3 k X1 25 var.et .k// D var.t / 41 C j j D1
Winters Method—Multiplicative Version In order to use the multiplicative version of Winters method, the time series and all predictions must be strictly positive. The model equation for the multiplicative version of Winters method is Yt D .t C ˇt t /sp .t / C t The smoothing equations are Lt D ˛.Yt =St Tt D .Lt
p/
Lt
C .1
˛/.Lt
1
/Tt
1
C .1
1/
St D ı.Yt =Lt / C .1
ı/St
C Tt
1/
p
The error-correction form of the smoothing equations is Lt D Lt
1
C Tt
1
C ˛et =St
Tt D Tt
1
C ˛ et =St
St D St
p
C ı.1
p
p
˛/et =Lt
N OTE : For missing values, et D 0. The k-step prediction equation is YOt .k/ D .Lt C kTt /St
pCk
The multiplicative version of Winters method does not have an ARIMA equivalent; however, when the seasonal variation is small, the ARIMA additive-invertible region of the additive version of Winters method described in the preceding section can approximate the stability region of the multiplicative version. The variance of the prediction errors is estimated as 2 1 p X X1 var.et .k// D var.t / 4 . j Cip St Ck =St Ck
3 25 j/
i D0 j D0
where
j
are as described for the additive version of Winters method and
j
D 0 for j k.
2908 F Chapter 46: Forecasting Process Details
ARIMA Models Autoregressive integrated moving-average (ARIMA) models predict values of a dependent time series with a linear combination of its own past values, past errors (also called shocks or innovations), and current and past values of other time series (predictor time series). The Time Series Forecasting System uses the ARIMA procedure of SAS/ETS software to fit and forecast ARIMA models. The maximum likelihood method is used for parameter estimation. Refer to Chapter 7, “The ARIMA Procedure,” for details of ARIMA model estimation and forecasting. This section summarizes the notation used for ARIMA models.
Notation for ARIMA Models A dependent time series that is modeled as a linear combination of its own past values and past values of an error series is known as a (pure) ARIMA model.
Nonseasonal ARIMA Model Notation The order of an ARIMA model is usually denoted by the notation ARIMA(p,d,q), where p
is the order of the autoregressive part.
d
is the order of the differencing (rarely should d > 2 be needed).
q
is the order of the moving-average process.
Given a dependent time series fYt W 1 t ng, mathematically the ARIMA model is written as .1
B/d Yt D C
.B/ at .B/
where t
indexes time.
is the mean term.
B
is the backshift operator; that is, BXt D Xt
.B/
is the autoregressive operator, represented as a polynomial in the back shift operator: .B/ D 1 1 B : : : p B p .
.B/
is the moving-average operator, represented as a polynomial in the back shift operator: .B/ D 1 1 B : : : q B q .
at
is the independent disturbance, also called the random error.
1.
Notation for ARIMA Models F 2909
For example, the mathematical form of the ARIMA(1,1,2) model is .1
B/Yt D C
.1
1 B 2 B 2 / at .1 1 B/
Seasonal ARIMA Model Notation Seasonal ARIMA models are expressed in factored form by the notation ARIMA(p,d,q)(P,D,Q)s , where P
is the order of the seasonal autoregressive part.
D
is the order of the seasonal differencing (rarely should D > 1 be needed).
Q
is the order of the seasonal moving-average process.
s
is the length of the seasonal cycle.
Given a dependent time series fYt W 1 t ng, mathematically the ARIMA seasonal model is written as .1
B/d .1
B s /D Yt D C
.B/s .B s / at .B/s .B s /
where s .B s /
is the seasonal autoregressive operator, represented as a polynomial in the back shift operator: s .B s / D 1 s;1 B s : : : s;P B sP
s .B s /
is the seasonal moving-average operator, represented as a polynomial in the back shift operator: s .B s / D 1 s;1 B s : : : s;Q B sQ
For example, the mathematical form of the ARIMA(1,0,1)(1,1,2)12 model is .1
B 12 /Yt D C
.1
1 B/.1 s;1 B 12 s;2 B 24 / at .1 1 B/.1 s;1 B 12 /
Abbreviated Notation for ARIMA Models If the differencing order, autoregressive order, or moving-average order is zero, the notation is further abbreviated as I(d)(D)s
integrated model or ARIMA(0,d,0)(0,D,0)
AR(p)(P)s
autoregressive model or ARIMA(p,0,0)(P,0,0)
IAR(p,d)(P,D)s
integrated autoregressive model or ARIMA(p,d,0)(P,D,0)s
MA(q)(Q)s
moving average model or ARIMA(0,0,q)(0,0,Q)s
IMA(d,q)(D,Q)s
integrated moving average model or ARIMA(0,d,q)(0,D,Q)s
ARMA(p,q)(P,Q)s
autoregressive moving-average model or ARIMA(p,0,q)(P,0,Q)s .
2910 F Chapter 46: Forecasting Process Details
Notation for Transfer Functions A transfer function can be used to filter a predictor time series to form a dynamic regression model. Let Yt be the dependent series, let Xt be the predictor series, and let ‰.B/ be a linear filter or transfer function for the effect of Xt on Yt . The ARIMA model is then .1
B/d .1
B s /D Yt D C ‰.B/.1
B/d .1
B s /D Xt C
.B/s .B s / at .B/s .B s /
This model is called a dynamic regression of Yt on Xt . Nonseasonal Transfer Function Notation
Given the ith predictor time series fXi;t W 1 t ng, the transfer function is written as
Dif.di /Lag.ki /N.qi /=D.pi / where di
is the simple order of the differencing for the ith predictor time series, .1 B/di Xi;t (rarely should di > 2 be needed).
ki
is the pure time delay (lag) for the effect of the ith predictor time series, Xi;t B ki D Xi;t ki .
pi
is the simple order of the denominator for the ith predictor time series.
qi
is the simple order of the numerator for the ith predictor time series.
The mathematical notation used to describe a transfer function is ‰i .B/ D
!i .B/ .1 ıi .B/
B/di B ki
where B
is the backshift operator; that is, BXt D Xt
ıi .B/
is the denominator polynomial of the transfer function for the ith predictor time series: ıi .B/ D 1 ıi;1 B : : : ıi;pi B pi .
!i .B/
is the numerator polynomial of the transfer function for the ith predictor time series: !i .B/ D 1 !i;1 B : : : !i;qi B qi .
1.
The numerator factors for a transfer function for a predictor series are like the MA part of the ARMA model for the noise series. The denominator factors for a transfer function for a predictor series are like the AR part of the ARMA model for the noise series. Denominator factors introduce exponentially weighted, infinite distributed lags into the transfer function. For example, the transfer function for the ith predictor time series with
Notation for ARIMA Models F 2911
ki D 3
time lag is 3
di D 1
simple order of differencing is one
pi D 1
simple order of the denominator is one
qi D 2
simple order of the numerator is two
would be written as [Dif(1)Lag(3)N(2)/D(1)]. The mathematical notation for the transfer function in this example is ‰i .B/ D
.1
!i;1 B !i;2 B 2 / .1 .1 ıi;1 B/
B/B 3
Seasonal Transfer Function Notation
The general transfer function notation for the ith predictor time series Xi;t with seasonal factors is [Dif(di )(Di )s Lag(ki ) N(qi )(Qi )s / D(pi )(Pi )s ] where Di
is the seasonal order of the differencing for the ith predictor time series (rarely should Di > 1 be needed).
Pi
is the seasonal order of the denominator for the ith predictor time series (rarely should Pi > 2 be needed).
Qi
is the seasonal order of the numerator for the ith predictor time series, (rarely should Qi > 2 be needed).
s
is the length of the seasonal cycle.
The mathematical notation used to describe a seasonal transfer function is ‰i .B/ D
!i .B/!s;i .B s / .1 ıi .B/ıs;i .B s /
B/di .1
B s /Di B ki
where ıs;i .B s /
is the denominator seasonal polynomial of the transfer function for the ith predictor time series: ıs;i .B/ D 1 ıs;i;1 B : : : ıs;i;Pi B sPi
!s;i .B s /
is the numerator seasonal polynomial of the transfer function for the ith predictor time series: !s;i .B/ D 1 !s;i;1 B : : : !s;i;Qi B sQi
For example, the transfer function for the ith predictor time series Xi;t whose seasonal cycle s D 12 with di D 2
simple order of differencing is two
Di D 1
seasonal order of differencing is one
qi D 2
simple order of the numerator is two
Qi D 1
seasonal order of the numerator is one
2912 F Chapter 46: Forecasting Process Details
would be written as [Dif(2)(1)s N(2)(1)s ]. The mathematical notation for the transfer function in this example is ‰i .B/ D .1
!i;1 B
!i;2 B 2 /.1
!s;i;1 B 12 /.1
B/2 .1
B 12 /
Note: In this case, [Dif(2)(1)s N(2)(1)s ] = [Dif(2)(1)s Lag(0)N(2)(1)s /D(0)(0)s ].
Predictor Series This section discusses time trend curves, seasonal dummies, interventions, and adjustments.
Time Trend Curves When you specify a time trend curve as a predictor in a forecasting model, the system computes a predictor series that is a deterministic function of time. This variable is then included in the model as a regressor, and the trend curve is fit to the dependent series by linear regression, in addition to other predictor series. Some kinds of nonlinear trend curves are fit by transforming the dependent series. For example, the exponential trend curve is actually a linear time trend fit to the logarithm of the series. For these trend curve specifications, the series transformation option is set automatically, and you cannot independently control both the time trend curve and transformation option. The computed time trend variable is included in the output data set in a variable named in accordance with the trend curve type. Let t represent the observation count from the start of the period of fit for the model, and let Xt represent the value of the time trend variable at observation t within the period of fit. The names and definitions of these variables are as follows. (Note: These deterministic variables are reserved variable names.) Linear trend
variable name _LINEAR_, with Xt D t
Quadratic trend
variable name _QUAD_, with Xt D .t c/2 . Note that a quadratic trend implies a linear trend as a special case and results in two regressors: _QUAD_ and _LINEAR_.
Cubic trend
variable name _CUBE_, with Xt D .t c/3 . Note that a cubic trend implies a quadratic trend as a special case and results in three regressors: _CUBE_, _QUAD_, and _LINEAR_.
Logistic trend
variable name _LOGIT_, with Xt D t. The model is a linear time trend applied to the logistic transform of the dependent series. Thus, specifying a logistic trend is equivalent to specifying the logistic series transformation and a linear time trend. A logistic trend predictor can be used only in conjunction with the logistic transformation, which is set automatically when you specify logistic trend.
c
Intervention Effects F 2913
Logarithmic trend
variable name _LOG_, with Xt D ln.t/
Exponential trend
variable name _EXP_, with Xt D t. The model is a linear time trend applied to the logarithms of the dependent series. Thus, specifying an exponential trend is equivalent to specifying the log series transformation and a linear time trend. An exponential trend predictor can be used only in conjunction with the log transformation, which is set automatically when you specify exponential trend.
Hyperbolic trend
variable name _HYP_, with Xt D 1=t
Power curve trend
variable name _POW_, with Xt D ln.t/. The model is a logarithmic time trend applied to the logarithms of the dependent series. Thus, specifying a power curve is equivalent to specifying the log series transformation and a logarithmic time trend. A power curve predictor can be used only in conjunction with the log transformation, which is set automatically when you specify a power curve trend.
EXP(A+B/TIME) trend
variable name _ERT_, with Xt D 1=t. The model is a hyperbolic time trend applied to the logarithms of the dependent series. Thus, specifying this trend curve is equivalent to specifying the log series transformation and a hyperbolic time trend. This trend curve can be used only in conjunction with the log transformation, which is set automatically when you specify this trend.
Intervention Effects Interventions are used for modeling events that occur at specific times. That is, they are known changes that affect the dependent series or outliers. The ith intervention series is included in the output data set with variable name _INTVi _, which is a reserved variable name. Point Interventions
The point intervention is a one-time event. The ith intervention series Xi;t has a point intervention at time ti nt when the series is nonzero only at time ti nt —that is, ( 1; t D ti nt Xi;t D 0; ot herwi se Step Interventions
Step interventions are continuing, and the input time series flags periods after the intervention. For a step intervention, before time ti nt , the ith intervention series Xi;t is zero and then steps to a constant level thereafter—that is, ( 1; t ti nt Xi;t D 0; ot herwi se
2914 F Chapter 46: Forecasting Process Details
Ramp Interventions
A ramp intervention is a continuing intervention that increases linearly after the intervention time. For a ramp intervention, before time ti nt , the ith intervention series Xi;t is zero and increases linearly thereafter—that is, proportional to time. ( t ti nt ; t ti nt Xi;t D 0; ot herwise Intervention Effect
Given the ith intervention series Xi;t , you can define how the intervention takes effect by filters (transfer functions) of the form ‰i .B/ D
1 1
!i;1 B ıi;1 B
::: :::
!i;qi B qi ıi;pi B pi
where B is the backshift operator Byt D yt
1.
The denominator of the transfer function determines the decay pattern of the intervention effect, whereas the numerator terms determine the size of the intervention effect time window. For example, the following intervention effects are associated with the respective transfer functions. Immediately
‰i .B/ D 1
Gradually
‰i .B/ D 1=.1
1 lag window
‰i .B/ D 1
!i;1 B
3 lag window
‰i .B/ D 1
!i;1 B
ıi;1 B/ !i;2 B 2
!i;3 B 3
Intervention Notation
The notation used to describe intervention effects has the form type :ti nt (qi )/(pi ), where type is point, step, or ramp; ti nt is the time of the intervention (for example, OCT87); qi is the transfer function numerator order; and pi is the transfer function denominator order. If qi D 0, the part “(qi )” is omitted; if pi D 0, the part “/(pi )” is omitted. In the Intervention Specification window, the Number of Lags option specifies the transfer function numerator order qi , and the Effect Decay Pattern option specifies the transfer function denominator order pi . In the Effect Decay Pattern options, values and resulting pi are: None, pi D 0; Exp, pi D 1; Wave, pi D 2. For example, a step intervention with date 08MAR90 and effect pattern Exp is denoted “Step:08MAR90/(1)” and has a transfer function filter ‰i .B/ D 1=.1 ı1 B/. A ramp intervention immediately applied on 08MAR90 is denoted “Ramp:08MAR90” and has a transfer function filter ‰i .B/ D 1.
Seasonal Dummy Inputs F 2915
Seasonal Dummy Inputs For a seasonal cycle of length s, the seasonal dummy regressors include fXi;t W 1 i .s
1/; 1 t ng
for models that include an intercept term and fXi;t W 1 i s; 1 t ng for models that exclude an intercept term. Each element of a seasonal dummy regressor is either zero or one, based on the following rule: ( 1; when i D t mod s Xi;t D 0; otherwise Note that if the model includes an intercept term, the number of seasonal dummy regressors is one less than s to ensure that the linear system is full rank. The seasonal dummy variables are included in the output data set with variable names prefixed with “SDUMMYi” and sequentially numbered. They are reserved variable names.
Series Diagnostic Tests This section describes the diagnostic tests that are used to determine the kinds of forecasting models appropriate for a series. The series diagnostics are a set of heuristics that provide recommendations on whether or not the forecasting model should contain a log transform, trend terms, and seasonal terms. These recommendations are used by the automatic model selection process to restrict the model search to a subset of the model selection list. (You can disable this behavior by using the Automatic Model Selection Options window.) The tests that are used by the series diagnostics do not always produce the correct classification of the series. They are intended to accelerate the process of searching for a good forecasting model for the series, but you should not rely on them if finding the very best model is important to you. If you have information about the appropriate kinds of forecasting models (perhaps from studying the plots and autocorrelations shown in the Series Viewer window), you can set the series diagnostic flags in the Series Diagnostics window. Select the YES, NO, or MAYBE values for the Log Transform, Trend, and Seasonality options in the Series Diagnostics window as you think appropriate. The series diagnostics tests are intended as a heuristic tool only, and no statistical validity is claimed for them. These tests might be modified and enhanced in future releases of the Time Series Forecasting System. The testing strategy is as follows:
2916 F Chapter 46: Forecasting Process Details
1. Log transform test. The log test fits a high-order autoregressive model to the series and to the log of the series and compares goodness-of-fit measures for the prediction errors of the two models. If this test finds that log transforming the series is suitable, the Log Transform option is set to YES, and the subsequent diagnostic tests are performed on the log transformed series. 2. Trend test. The resultant series is tested for presence of a trend by using an augmented DickeyFuller test and a random walk with drift test. If either test finds that the series appears to have a trend, the Trend option is set to YES, and the subsequent diagnostic tests are performed on the differenced series. 3. Seasonality test. The resultant series is tested for seasonality. A seasonal dummy model with AR(1) errors is fit and the joint significance of the seasonal dummy estimates is tested. If the seasonal dummies are significant, the AIC statistic for this model is compared to the AIC for and AR(1) model without seasonal dummies. If the AIC for the seasonal model is lower than that of the nonseasonal model, the Seasonal option is set to YES.
Statistics of Fit This section explains the goodness-of-fit statistics reported to measure how well different models fit the data. The statistics of fit for the various forecasting models can be viewed or stored in a data set by using the Model Viewer window. Statistics of fit are computed by using the actual and forecasted values for observations in the period of evaluation. One-step forecasted values are used whenever possible, including the case when a hold-out sample contains no missing values. If a one-step forecast for an observation cannot be computed due to missing values for previous series observations, a multi-step forecast is computed, using the minimum number of steps as the previous nonmissing values in the data range permit. The various statistics of fit reported are as follows. In these formulas, n is the number of nonmissing observations and k is the number of fitted parameters in the model. Number of Nonmissing Observations. The number of nonmissing observations used to fit the model. Number of Observations. The total number of observations used to fit the model, including both missing and nonmissing observations. Number of Missing Actuals. The number of missing actual values. Number of Missing Predicted Values. The number of missing predicted values. Number of Model Parameters. The number of parameters fit to the data. For combined forecast, this is the number of forecast components.
Statistics of Fit F 2917
Total Sum of Squares (Uncorrected). P The total sum of squares for the series, SST, uncorrected for the mean: ntD1 yt2 . Total Sum of Squares (Corrected). P The total sum of squares for the series, SST, corrected for the mean: ntD1 .yt is the series mean. Sum of Square Errors. P The sum of the squared prediction errors, SSE. SSE D ntD1 .yt predicted value.
y/2 , where y
yOt /2 , where yO is the one-step
Mean Squared Error. The mean squared prediction error, MSE, calculated from the one-step-ahead forecasts. MSE D n1 SSE. This formula enables you to evaluate small hold-out samples. Root Mean Squared Error. p The root mean square error (RMSE), MSE. Mean Absolute Percent Error. The mean absolute percent prediction error (MAPE), The summation ignores observations where yt D 0. Mean Absolute Error. The mean absolute prediction error,
1 n
Pn
tD1 jyt
100 n
Pn
t D1 j.yt
yOt /=yt j.
yOt j.
R-Square. The R2 statistic, R2 D 1 SSE=SST. If the model fits the series badly, the model error sum of squares, SSE, can be larger than SST and the R2 statistic will be negative. Adjusted R-Square. The adjusted R2 statistic, 1
. nn
1 /.1 k
R2 /.
Amemiya’s Adjusted R-Square. Amemiya’s adjusted R2 , 1 . nCk /.1 n k
R2 /.
Random Walk R-Square. The random walk R2 statistic (Harvey’s R2 statistic by using P the random walk model for comparison), 1 . n n 1 /SSE=RWSSE, where RWSSE D ntD2 .yt yt 1 /2 , and P D n 1 1 ntD2 .yt yt 1 /. Akaike’s Information Criterion. Akaike’s information criterion (AIC), n ln.MSE/ C 2k. Schwarz Bayesian Information Criterion. Schwarz Bayesian information criterion (SBC or BIC), n ln.MSE/ C k ln.n/. Amemiya’s Prediction Criterion. Amemiya’s prediction criterion, n1 SST. nCk /.1 n k
R2 / D . nCk / 1 SSE. n k n
Maximum Error. The largest prediction error. Minimum Error. The smallest prediction error. Maximum Percent Error. The largest percent prediction error, 100 max..yt tions where yt D 0.
yOt /=yt /. The summation ignores observa-
2918 F Chapter 46: Forecasting Process Details
Minimum Percent Error. The smallest percent prediction error, 100 min..yt tions where yt D 0. Mean Error. The mean prediction error,
1 n
Pn
t D1 .yt
Mean Percent Error. The mean percent prediction error, where yt D 0.
100 n
yOt /=yt /. The summation ignores observa-
yOt /. Pn
t D1
.yt yO t / . yt
The summation ignores observations
References Akaike, H. (1974), “A New Look at the Statistical Model Identification,” IEEE Transaction on Automatic Control, AC-19, 716–723. Aldrin, M. and Damsleth, E. (1989), “Forecasting Nonseasonal Time Series with Missing Observations,” Journal of Forecasting, 8, 97–116. Anderson, T.W. (1971), The Statistical Analysis of Time Series, New York: John Wiley & Sons. Ansley, C. (1979), “An Algorithm for the Exact Likelihood of a Mixed Autoregressive MovingAverage Process,” Biometrika, 66, 59. Ansley, C. and Newbold, P. (1980), “Finite Sample Properties of Estimators for Autoregressive Moving-Average Models,” Journal of Econometrics, 13, 159. Archibald, B.C. (1990), “Parameter Space of the Holt-Winters Model,” International Journal of Forecasting, 6, 199–209. Bartolomei, S.M. and Sweet, A.L. (1989), “A Note on the Comparison of Exponential Smoothing Methods for Forecasting Seasonal Series,” International Journal of Forecasting, 5, 111–116. Bhansali, R.J. (1980), “Autoregressive and Window Estimates of the Inverse Correlation Function,” Biometrika, 67, 551–566. Bowerman, B.L. and O’Connell, R.T. (1979), Time Series and Forecasting: An Applied Approach, North Scituate, Massachusetts: Duxbury Press. Box, G.E.P. and Cox D.R. (1964), “An Analysis of Transformations,” Journal of Royal Statistical Society B, No. 26, 211–243. Box, G.E.P. and Jenkins, G.M. (1976), Time Series Analysis: Forecasting and Control, Revised Edition, San Francisco: Holden-Day. Box, G.E.P. and Tiao, G.C. (1975), “Intervention Analysis with Applications to Economic and Environmental Problems,” JASA, 70, 70–79.
References F 2919
Brocklebank, J.C. and Dickey, D.A. (1986), SAS System for Forecasting Time Series, 1986 Edition, Cary, North Carolina: SAS Institute Inc. Brown, R.G. (1962), Smoothing, Forecasting, and Prediction of Discrete Time Series, New York: Prentice-Hall. Brown, R.G. and Meyer, R.F. (1961), “The Fundamental Theorem of Exponential Smoothing,” Operations Research, 9, 673–685. Chatfield, C. (1978), “The Holt-Winters Forecasting Procedure,” Applied Statistics, 27, 264–279. Chatfield, C., and Prothero, D.L. (1973), “Box-Jenkins Seasonal Forecasting: Problems in a Case Study,” Journal of the Royal Statistical Society, Series A, 136, 295–315. Chatfield, C. and Yar, M. (1988), “Holt-Winters Forecasting: Some Practical Issues,” The Statistician, 37, 129–140. Chatfield, C. and Yar, M. (1991), “Prediction Intervals for Multiplicative Holt-Winters,” International Journal of Forecasting, 7, 31–37. Cogger, K.O. (1974), “The Optimality of General-Order Exponential Smoothing,” Operations Research, 22, 858. Cox, D. R. (1961), “Prediction by Exponentially Weighted Moving Averages and Related Methods,” Journal of the Royal Statistical Society, Series B, 23, 414–422. Davidson, J. (1981), “Problems with the Estimation of Moving-Average Models,” Journal of Econometrics, 16, 295. Dickey, D. A., and Fuller, W.A. (1979), “Distribution of the Estimators for Autoregressive Time Series with a Unit Root,” Journal of the American Statistical Association, 74(366), 427–431. Dickey, D. A., Hasza, D. P., and Fuller, W.A. (1984), “Testing for Unit Roots in Seasonal Time Series,” Journal of the American Statistical Association, 79(386), 355–367. Fair, R.C. (1986), “Evaluating the Predictive Accuracy of Models,” in Handbook of Econometrics, Vol. 3., Griliches, Z. and Intriligator, M.D., eds., New York: North Holland. Fildes, R. (1979), “Quantitative Forecasting—the State of the Art: Extrapolative Models,” Journal of Operational Research Society, 30, 691–710. Fuller, W.A. (1976), Introduction to Statistical Time Series, New York: John Wiley & Sons. Gardner, E.S., Jr. (1984), “The Strange Case of the Lagging Forecasts,” Interfaces, 14, 47–50. Gardner, E.S., Jr. (1985), “Exponential Smoothing: the State of the Art,” Journal of Forecasting, 4, 1–38. Granger, C.W.J. and Newbold, P. (1977), Forecasting Economic Time Series, New York: Academic Press, Inc. Greene, W.H. (1993), Econometric Analysis, Second Edition, New York: Macmillan Publishing Company.
2920 F Chapter 46: Forecasting Process Details
Hamilton, J. D. (1994), Time Series Analysis, Princeton: Princeton University Press. Harvey, A.C. (1981), Time Series Models, New York: John Wiley & Sons. Harvey, A.C. (1984), “A Unified View of Statistical Forecasting Procedures,” Journal of Forecasting, 3, 245–275. Hopewood, W.S., McKeown, J.C., and Newbold, P. (1984), “Time Series Forecasting Models Involving Power Transformations,” Journal of Forecasting, Vol 3, No. 1, 57–61. Jones, Richard H. (1980), “Maximum Likelihood Fitting of ARMA Models to Time Series with Missing Observations,” Technometrics, 22, 389–396. Judge, G.G., Griffiths, W.E., Hill, R.C., and Lee, T.C. (1980), The Theory and Practice of Econometrics, New York: John Wiley & Sons. Ledolter, J. and Abraham, B. (1984), “Some Comments on the Initialization of Exponential Smoothing,” Journal of Forecasting, 3, 79–84. Ljung, G.M. and Box, G.E.P. (1978), “On a Measure of Lack of Fit in Time Series Models,” Biometrika, 65, 297–303. Makridakis, S., Wheelwright, S.C., and McGee, V.E. (1983), Forecasting: Methods and Applications, Second Edition, New York: John Wiley & Sons. McKenzie, Ed (1984), “General Exponential Smoothing and the Equivalent ARMA Process,” Journal of Forecasting, 3, 333–344. McKenzie, Ed (1986), “Error Analysis for Winters’ Additive Seasonal Forecasting System,” International Journal of Forecasting, 2, 373–382. Montgomery, D.C. and Johnson, L.A. (1976), Forecasting and Time Series Analysis, New York: McGraw-Hill. Morf, M., Sidhu, G.S., and Kailath, T. (1974), “Some New Algorithms for Recursive Estimation on Constant Linear Discrete Time Systems,” I.E.E.E. Transactions on Automatic Control, AC-19, 315–323. Nelson, C.R. (1973), Applied Time Series for Managerial Forecasting, San Francisco: Holden-Day. Newbold, P. (1981), “Some Recent Developments in Time Series Analysis,” International Statistical Review, 49, 53–66. Newton, H. Joseph and Pagano, Marcello (1983), “The Finite Memory Prediction of Covariance Stationary Time Series,” SIAM Journal of Scientific and Statistical Computing, 4, 330–339. Pankratz, Alan (1983), Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, New York: John Wiley & Sons. Pankratz, Alan (1991), Forecasting with Dynamic Regression Models, New York: John Wiley & Sons.
References F 2921
Pankratz, A. and Dudley, U. (1987), “Forecast of Power-Transformed Series,” Journal of Forecasting, Vol 6, No. 4, 239–248. Pearlman, J.G. (1980), “An Algorithm for the Exact Likelihood of a High-Order Autoregressive Moving-Average Process,” Biometrika, 67, 232–233. Priestly, M.B. (1981), Spectral Analysis and Time Series, Volume 1: Univariate Series, New York: Academic Press, Inc. Roberts, S.A. (1982), “A General Class of Holt-Winters Type Forecasting Models,” Management Science, 28, 808–820. Schwarz, G. (1978), “Estimating the Dimension of a Model,” Annals of Statistics, 6, 461–464. Sweet, A.L. (1985), “Computing the Variance of the Forecast Error for the Holt-Winters Seasonal Models,” Journal of Forecasting, 4, 235–243. Winters, P.R. (1960), “Forecasting Sales by Exponentially Weighted Moving Averages,” Management Science, 6, 324–342. Yar, M. and Chatfield, C. (1990), “Prediction Intervals for the Holt-Winters Forecasting Procedure,” International Journal of Forecasting, 6, 127–137. Woodfield, T.J. (1987), “Time Series Intervention Analysis Using SAS Software,” Proceedings of the Twelfth Annual SAS Users Group International Conference, 331–339. Cary, NC: SAS Institute Inc.
2922
Part V
SAS/ETS Model Editor (Experimental)
2924
Chapter 47
SAS/ETS Model Editor Window Reference Contents Overview of SAS/ETS Model Editor . . . . . . . . . . . . . Invoking the SAS/ETS Model Editor Application . . . . . . Model Browser Window . . . . . . . . . . . . . . . . . . . Create a New Model Template . . . . . . . . . . . . . . . . Equations Details . . . . . . . . . . . . . . . . . . . . Constraints Details . . . . . . . . . . . . . . . . . . . New Fitted Model Wizard . . . . . . . . . . . . . . . . . . Name Your Model Page . . . . . . . . . . . . . . . . Select the Data Set to Fit Page . . . . . . . . . . . . . Assign Variables Page . . . . . . . . . . . . . . . . . Enter (Verify) the Formula for Your Model Page . . . Map Program Symbols to Data Set Variable Page . . . Set Fit Options Page . . . . . . . . . . . . . . . . . . Output and Reports . . . . . . . . . . . . . . . . . . . . . . Equation Results Window . . . . . . . . . . . . . . . Time Series Window . . . . . . . . . . . . . . . . . . Graphing Options Window . . . . . . . . . . . . . . . Model Summary Results Window . . . . . . . . . . . Fitted Model Equation Results Window . . . . . . . . Fitted Model Covariance/Correlation Matrix Window Fitted Model Distribution Window . . . . . . . . . . Model Parameters Estimates Window . . . . . . . . . Residuals Plot Window . . . . . . . . . . . . . . . . Model Summary of Residual Errors Window . . . . . Actual v/s Predicted Plot Window . . . . . . . . . . . Edit Existing Fitted Model . . . . . . . . . . . . . . . . . . Open Existing Model . . . . . . . . . . . . . . . . . Fit Model—Equations . . . . . . . . . . . . . . . . . Fit Model—Input . . . . . . . . . . . . . . . . . . . . Fit Model—Method . . . . . . . . . . . . . . . . . . Fit Model—Iteration . . . . . . . . . . . . . . . . . . Fit Model—Tests . . . . . . . . . . . . . . . . . . . . Fit Model—Results . . . . . . . . . . . . . . . . . . Define the Model Parameters and Variables . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
. .
.
2926 2926 2927 2928 2930 2932 2934 2936 2936 2940 2942 2943 2945 2952 2952 2954 2955 2956 2958 2959 2959 2960 2961 2962 2962 2963 2963 2964 2965 2967 2967 2969 2970 2971
2926 F Chapter 47: SAS/ETS Model Editor Window Reference
Model Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Define the Model Parameters, Variables, and Equations . . . . . . . . . . . Model Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2973 2974 2975
Overview of SAS/ETS Model Editor The SAS/ETS Model Editor enables you to interactively create and edit model templates and fitted models that are associated with econometric time series models and risk models. These models are developed from time series and cross-sectional data. You can use these models to statistically model market behavior. The SAS/ETS Model Editor consists of the following: a fitted model wizard with which you can create and define the equation statements, variables and parameters, constraints, and fit options on a step-by-step basis. You can apply a fitted model to any specific market data. a program editor panel, which enables you to write programming code to define your model and create additional dialog boxes. The additional dialog boxes enable you to more explicity specify properties associated with your model. a model template which you can also create to define the equation statements, variables and parameters, and constraints that you need in the programming code. Model templates are commonly used models that can be applied to a wide variety of data. You can use either the Fitted Model Wizard or a model template to fit the model to a specific market data set to generate the model parameter estimates and other statistical results as specified in the MODEL procedure. The MODEL procedure analyzes models in which the relationships among the variables comprise a system of one or more nonlinear equations. Primary uses of the MODEL procedure are estimation, simulation, and forecasting of nonlinear simultaneous equation models. See Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information.This chapter provides a reference to the various windows of the SAS/ETS Model Editor. The windows are presented in work flow order. Each section describes the purpose of the window, how to open it, its controls, fields, and menus.
Invoking the SAS/ETS Model Editor Application You can start the SAS/ETS Model Editor application when using the SAS system interactively by submitting the command %modedit; in the SAS Display Manager Program Editor, as shown in Figure 47.1.
Model Browser Window F 2927
Figure 47.1 Invoking the SAS/ETS Model Editor
Model Browser Window From the SAS/ETS Model Editor main menu, select View IShow Model Browser to open the Model Browser window. The Model Browser window displays a list of the model templates and fitted models along with attributes about each item in the list. When the window first opens, the list contains all the model templates and fitted models that are displayed in the model tree in the desktop. To constrain the items in the list, click Find and specify your constraints. You can sort the data in each column by clicking the column heading. You can also right-click any row in the table to access the same pop-up menu that is available for entries in the model tree on the desktop. Figure 47.2 Model Browser Window
The Model Browser window has the following controls and fields:
2928 F Chapter 47: SAS/ETS Model Editor Window Reference
Types: specifies which types of model you want to view. You can select Model Templates, Fitted Models, or both. Library Name: is the name of the library (for example, RISKSAMP) where the models are located. N OTE : Predefined libraries, such as RISKSAMP, contain predefined, read-only models. You can open and apply these models to specific market data sets. But if you want to modify a predefined model template, you need to copy or rename the model template into a temporary directory such as WORK or SASUSER, rename it, and work from there. To copy an object to a desired library, right-click the object and select Duplicate. Data Set Name: is the input data set in which the model resides. Find enables you to constrain the data displayed in Model Browser window to only those models that match the search criteria indicated. Model Name: is the name of the model. A model name can contain up to 32 characters, which can be underscores, letters (A–Z or a–z) or numerals (0–9). A model name cannot contain spaces. Interval: is the time interval (data frequency) for the input data set. Specify specifies which time frequency is to be used in the time series model. Click Specify to open the Specify Frequency window, which is described in the section “Specify Frequency Window” on page 2938. Clear clears the search criteria that you have currently specified and resets the search fields to the defaults.
Create a New Model Template A model template is a convenient way to predefine a commonly used model. After you create a model template, you can use it repeatedly to create a fitted model for various data sets without recreating the model parameters, equations, and constraints. To create a new model template, right-click a SAS library in the SAS libraries panel, and then select New Model Template. A window opens with its title of the form SASLibrary.newElement. In this window, you can do the following: specify the equations and variable and parameter definitions in Equations on the Details tab. See “Equations Details” on page 2930 for more details.
Create a New Model Template F 2929
specify the model constraints for the model in Constraints on the Details tab. See “Constraints Details” on page 2932 for more details. view the generated SAS code on the SAS Code tab. check the syntax of the SAS code by clicking the Check Syntax button. save the model template by clicking OK. For information about equation statements, variable and parameter definitions, and constraints in the MODEL procedure, see Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information. The top of the model window has the following fields: Name: is the name of the current new model. You can change the default name to another name such as “MYSPEC”. A name can contain up to 32 characters, which can be underscores, letters (A–Z or a–z) or numerals (0–9). A name cannot contain spaces. Description: is description of the model. You can type some description such as “My favorite template”. This field is optional. Library: is the SAS library where the model is to be stored. The model window has two tabs: Details and SAS Code. On the Details tab, you can select Equations or Constraints in the left pane; the fields displayed in the right pane change dependent on the selection in the left pane. For more information about the Details tab, see “Equations Details” on page 2930 and “Constraints Details” on page 2932. When you define the model variables and equations, you can click SAS Code tab to view the MODEL procedure code that is generated by the values specified on the Details tab. The following SAS statements provide an example of equivalent PROC MODEL code. See Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information about PROC MODEL. proc model outspec=(WORK.MODSPEC specname=CIR speclabel="Cox Ingersoll Ross") endogenous rate ; parms kappa ; parms theta ; parms sigma ; label kappa = 'Speed of Mean Reversion' ; label theta = 'Long term Mean' ; label sigma = 'Constant part of Variance' ; rate = lag(rate) + kappa * (theta - lag(rate) ) ; h.rate = sigma * sigma * lag(rate); run; quit;
;
2930 F Chapter 47: SAS/ETS Model Editor Window Reference
Equations Details When you select Equations on the Details tab, you can specify the equation for a model and define the variables and parameters for the model. Figure 47.3 Create New Model Template—Equations
Equations Details F 2931
When Equations is selected on the Details tab, the following controls and fields are displayed: Equation: specifies the equations for the model. Variable and Parameter Definitions lists the variables and parameters for the model that is defined in the Equation: field. To add a new row, click the Add button . A new entry appears in the Variable and Parameter Definitions table. To edit the value for an entry, double-click the appropriate cell in the Variable and Parameter Definitions table. To delete an entry, select an existing entry and then click the Delete button
.
Name specifies the name of the variable or parameter. Type specifies the variable type (such as endogenous). Label specifies the label of the variable or parameter. Instrument? specifies whether the variable or parameter is an instrument. Output To Data Set specifies the output to data set options. The selections map to the options DROP, KEEP, and OUTVARS of the MODEL procedure. All dependent variables are instruments. indicates whether all of the exogenous variables are to be instrumental variables. (This check box is equivalent to the _EXOG_ option in the INSTRUMENTS statement in the MODEL procedure. See Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information about PROC MODEL. Instruments Only Use Intercept indicates whether to include only the intercept in the list to be an instrument variable. Instruments Include Intercept indicates whether to include an intercept term and all of the exogenous variables as instrumental variables. ID specifies the name of the ID variable to be created for the input data set. You can type any valid SAS variable name in this field or select from the list. Range: specifies the time range between observations in the data set. Select a range from list. Start: specifies the starting date for the time series in the data set. Enter a date value in this field, using a form that is recognizable as a SAS date informat (for example, 1998:1, feb1997, or 03mar1998).
2932 F Chapter 47: SAS/ETS Model Editor Window Reference
End: specifies the ending date for the time series in the data set. Enter a date value in this field, using a form that is recognizable as a SAS date informat (for example, 1998:1, feb1997, or 03mar1998). SAS Code provides the SAS statements for the model. Add button adds a variable row to the table. Delete button removes variable row from the table. OK saves the model template. Cancel closes the window without implementing any changes. Help provides help about this window.
Constraints Details Use this window to specify the boundary constraints and linear and nonlinear restrictions for the parameter estimates. To open this window, right-click a SAS library in the SAS libraries panel, and then select New Model Template from the resulting menu, and then select click Constraints in the left pane of the window.
Constraints Details F 2933
Figure 47.4 Create New Model Template—Constraints
2934 F Chapter 47: SAS/ETS Model Editor Window Reference
Figure 47.4 continued
When Constraints is selected on the Details tab, the following controls and fields are displayed: Simple Bounds specifies the boundary constraints. To create a new row, click the Add button entry appears in the Simples Bounds table.
. A new
To edit the value for an entry, double-click the appropriate row. To delete an entry, select an existing entry and then click the Delete button
.
Restrictions lists the linear and nonlinear restrictions. To create a new row, click the Add button new entry appears in the Restrictions table.
.A
To edit the values for an entry, double-click the appropriate cell in the Simple Bounds table. To delete an entry, select an existing entry and then click the Delete button
.
New Fitted Model Wizard The New Fitted Model wizard provides a step-by-step interface that enables you to create and define the equation statements, variable definitions, parameter definitions, and constraints that can be used to
New Fitted Model Wizard F 2935
fit many different market data sets. You can create a fitted model either from scratch or by modifying an existing model template. The following instructions walk you through creating a fitted model from an existing model template. To create a new fitted model from an existing model template, right-click a named model template in the SAS libraries panel, and then select New Fitted Model. The Fitted Model wizard opens. The Fitted Model wizard sequentially asks you to provide the following information: 1. name of your model and the location where it is to be stored. 2. name of an existing input data set. 3. creation or verification of the formula for your model. 4. model parameters and variables. 5. mapping of the variables from the model template to the input data set variables if the fitted model is created using a model template. 6. name of any output data sets desired (optional). 7. configuration of the fit options (such as estimation methods and optimization settings) for the model. After you have provided the preceding information, the new model is fitted against the supplied input data set, and the results and model are stored at the location specified. If the new fitted model is built from a model template, its content is similar to the template content with the following exceptions: the input and output data sets are defined and all of the model program variables are changed based on the variable mapping. After a fitted model is created, subsequent changes that are made to the model template from which it was created have no effect on the fitted model. The lower part of each page in the Fitted Model wizard contains the following controls: Details provides the options for each step. SAS Code provides the SAS statements for the entire model. Back moves the wizard back to the previous page. Next moves the wizard forward to the next page. Finish saves the model and causes the model program to run and generate the statistical results. Cancel closes the wizard unconditionally, and nothing is saved. Help provides help about the wizard. The following sections show the step-by-step process of the Fitted New Model wizards.
2936 F Chapter 47: SAS/ETS Model Editor Window Reference
Name Your Model Page The initial page of the wizard is the Name Your Model page. This page informs you that you are creating a fitted model from a model template. You can change the name of the model. You must specify the library and data set where your model is to reside. The name must be unique within the library and data set. C AUTION : If you do not change the name or library, the original model template is overwritten. Figure 47.5 Model Fitting Wizard
This page has the following controls and fields: Name: is the name the current new model. Description: is the description of the model. Library: is the SAS library where the model is to be stored. Data Set: is the SAS data set where the model and fitted results are to be stored.
Select the Data Set to Fit Page The Select the Data Set to fit page asks you to specify the name and location of an input data set. You can also view time series of any selected variables.
Select the Data Set to Fit Page F 2937
Figure 47.6 Model Fitting Wizard
This page has the following controls and fields: Library Name: is the library where the input data set is located. Data Set Name: is the input data set against which you want the model fitted. Browse opens the data set selection window for selecting an input data set. After you specify the input data set, the variable names in the data set are listed in the Selected Variables to View Time Series table. Check Data Time Series is the list of variables, ID, frequency, and time series property of the input data set. Select Variables to View Time Series lists the name and label for each variable in the data set. ID: specifies a single variable to identify observations in error messages, in other listings, or in the output data set. The ID variables are usually SAS date or datetime variables. You can type any valid SAS variable name in this field or select from the list. Frequency: is the time interval (data frequency) for the input data set. Specify opens the Specify Frequency window. See the section “Specify Frequency Window” on page 2938 for more information. View Time Series opens a window that displays a time series plot and of the table of observations for the selected
2938 F Chapter 47: SAS/ETS Model Editor Window Reference
variables series. See the section “View Time Series of the Data Window” on page 2939 for more information.
Specify Frequency Window When you click Specify in the Select the data set to fit page, the Specify Frequency dialog box opens. This dialog box enables you to specify the name of the time variables. This dialog box is equivalent to the INTERVAL= option in the ID statement in PROC MODEL. See Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information about PROC MODEL statements and options. Figure 47.7 Specify Frequency Window
This dialog box has the following controls and fields: Type: specifies the type of the date values. For example, Day specifies date values that correspond to single day periods. Hour specifies datetime values that correspond to single hour periods. Multiplier: specifies a multiplier for date values. The value 1 selects date values that correspond to periods of one of the periods specified by the Type value. The value 2 selects date values that correspond to periods that are twice the duration specified by the Type value. Shift: specifies the date values shift. The value 1 selects date values that correspond to periods of one
Select the Data Set to Fit Page F 2939
date value beginning on the first date value. The value 2 selects date values that correspond to periods of one date value beginning on the second date value. Weekend: specifies days to be considered as weekends. Date values include time of day controls whether the date values include time of day. When this check box is selected, date values include the time of day. OK closes the window and returns to the Select the dataset to fit page. Cancel closes the window without implementing any changes and returns to the Select the dataset to fit page.
View Time Series of the Data Window Figure 47.8 View Time Series of the Data
2940 F Chapter 47: SAS/ETS Model Editor Window Reference
Figure 47.8 continued
This window has the following controls and fields: Graphing Options opens a window that enables you to change graphing options. Series Values displays a plot of the time series of the selected variables. Observation displays a table of all the observations of the data set. OK closes the View Time Series window.
Assign Variables Page This page enables you to assign variables from the input data set to different categories such as endogenous variables, exogenous variables, and instruments variables.
Assign Variables Page F 2941
Figure 47.9 Model Fitting Wizard
This page has the following controls and fields: Select Variables to Assign table lists the variable names and labels of the input data set. Endogenous Variables: table lists the endogenous (dependent) variables of the model. An endogenous variable is a variable that is determined within the model. Endogenous variables can appear on the right and left sides of model equations, but usually only on one side for each equation. Exogenous Variables: table lists the exogenous (independent) variables of this model. An exogenous variable is a variable that is determined outside of the model. Exogenous variables appear on the right side of model equations, and by default exogenous variables are used as instruments for estimation methods that require instrumental variables (for example, 2SLS, IT2SLS, 3SLS, IT3SLS, GMM, and ITGMM). Instruments: table lists the model instrumental variables. Instrumental variables are used in estimation methods that require instrumental variables (for example, 2SLS, IT2SLS, 3SLS, IT3SLS, GMM, and ITGMM). Instrumental variable estimation methods are appropriate when a model contains a right-side regressor variable that is correlated with the error term. In 2SLS and 3SLS, a first-stage regression is performed to create a linear combination of instruments to replace the regressor that is correlated with the error term. Appropriate instrumental variables are correlated with the regressor they are replacing, but they are uncorrelated with the error term. They should not depend on the endogenous variables. Right Arrow button enables you to move highlighted variables into the table to the right of the button. Delete button enables you to delete highlighted variables from the table to the left of the button.
2942 F Chapter 47: SAS/ETS Model Editor Window Reference
Enter (Verify) the Formula for Your Model Page The Enter (Verify) the formula for your model page asks you to enter or verify the formula for your model. The title of the page is “Enter the formula for your model” when you are creating a new model; it is “Verify the formula for your model” when you are modifying an existing template. Figure 47.10 Enter or Verify the Formula for Your Model Wizard Page
This page has the following controls and fields: Equation: displays the equations to apply for modeling. Variable and Parameter Definitions displays the list of variable names, types, labels, initial values, whether this variable is an instrument, and output behavior. To specify a value in the table, click in the cell and type the desired value. To add a row to the table, click the Add button table, click the Delete button
. To delete a row from the
.
All dependent variables are instruments. indicates whether all of the exogenous variables are to be instrumental variables. (This check box is equivalent to the _EXOG_ option in the INSTRUMENTS statement in the MODEL procedure. See Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information.
Map Program Symbols to Data Set Variable Page F 2943
Instruments Only Use Intercept indicates whether to include only the intercept in the list to be instrument variables. Instruments Include Intercept indicates whether to include an intercept term and all of the exogenous variables as instrumental variables. ID specifies the name of the ID variable to be created for the input data set. You can type any valid SAS variable name in this field or select from the list. Range: specifies the time range between observations in the data set. Select a range from list. Start: specifies the starting date for the time series in the data set. Enter a date value in this field, using a form that is recognizable as a SAS date informat (for example, 1998:1, feb1997, or 03mar1998). End: specifies the ending date for the time series in the data set. Enter a date value in this field, using a form that is recognizable as a SAS date informat (for example, 1998:1, feb1997, or 03mar1998). SAS Code provides the SAS statements for the model. Add button adds a variable row to the table. Delete button removes variable row from the table.
Map Program Symbols to Data Set Variable Page Because the variable names of the model and the input data set might not be the same, you need to define a unique mapping between the two sets of variable names. This page enables you to uniquely map the variable names that are used in an existing model program (program symbols) to the variable names of an input data set. The purpose of this mapping is to create a new model program that has changed variable names which reflect this mapping.
2944 F Chapter 47: SAS/ETS Model Editor Window Reference
Figure 47.11 Map Program Symbols to Dataset Variables Wizard Page
This page has the following controls and fields: Program Symbols: table lists the name, type, and label of the variables in the existing model. The type of the variables include endogenous, exogenous, and variable. The Program Symbols table contains all of the defined model variables of the model program that can be mapped. Note that the model parameters are never listed. Data Set: table contains all of the input data set numeric variables that can be mapped. It lists the name and label of the variables in the data set. You can identify the type of each variable by mapping program symbols to data set variables. Use the following controls to map between entries in the Program Symbols table and the Data Set table. New Mapping button creates a new mapping between a model variable selected in the Program Symbols table and a data set variable selected in the Data Set table. This button becomes available only when a model variable and a data set variable are selected. To create a mapping, click , then click an entry in the Program Symbols: table and drag the pointer to an entry in the Dataset: table. Delete button deletes the selected mapping. Evaluate button evaluates data.
Set Fit Options Page F 2945
Automatic Mapping button automatically maps program variables to data set parameters by using their names. Click this button to map variables between the Program Symbols and Data Set tables by their variable names. If the Program Symbols table contains a variable whose name matches a variable in the Data Set table, these variables are automatically mapped to each other. Suppose that a model with variables Cspread1m, Cspread3m, Cspread6m, Cspread12m, and Date currently exists and you want to use this existing model with an input data set that contains the variables date, usdr0, wti0m, and nymng0m. You can use the Map program symbols to data set variables page to map Cspread1m, Cspread3m, Cspread6m, Cspread12m, and Date to date, usdr0, wti0m, and nymng0m. You can then use this mapping to create a new model program that replaces all instances of Cspread1m with usdr0 and all instances of Cspread3m with nymng0m, and so on. You can also use the new model program with the input data set to generate model fit results.
Set Fit Options Page When this wizard page first opens, it appears with Equation selected from the list on the left side of the page. The contents of the page change when other values in the list on the left side are selected. The following sections describe the contents of the page for each of the possible selections.
Set Fit Options for Equations The Set fit options page enables you to estimate model parameters by fitting the model equations to input data and optionally select the equations to be fit. Figure 47.12 Set Fit Options for an Equation
2946 F Chapter 47: SAS/ETS Model Editor Window Reference
This page has the following controls and fields: Equations Available to Fit: box lists the variables in the input data set that you can select to fit equations to. Equations To Fit: box lists the variables in the input data set that are selected to fit equations to. Sort By: indicates how the fit options are sorted. This field has two list boxes. In the left list box, select the variable by which to sort. In the right list box, select Ascending, Descending, or Not Sorted. Weight: specifies a weighting value to apply to each observation when estimating parameters. Missing Value Behavior: specifies whether missing values are tracked on an equation-by-equation basis or the entire observation is omitted from the analysis when any equation has a missing predicted or actual value for the equation. Library Name: specifies the SAS library where you can select your data set. Dataset Name: specifies the selected input data set from the SAS library you would like to work on. Browse button opens the data set selection window for selecting an input data set.
Set Fit Options for a Method This page enables you to choose how the parameters are to be estimated. See Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information about available methods of parameter estimation.
Set Fit Options Page F 2947
Figure 47.13 Set Fit Options for a Method
This page has the following controls and fields: Estimation Methods check boxes enable you to select the corresponding estimation method. All of the methods implemented in PROC MODEL aim to minimize an objective function. The following methods for parameter estimation can be selected (the equivalent MODEL procedure option is shown in parentheses):
Full Information Maximum Likelihood (FIML)
Moore-Penrose Generalized Inverse (MPGI)
Generalized Method of Moments (GMM)
Iterated Generalized Method of Moments (ITGMM)
Ordinary Least Squares (OLS) (This is the default.)
Iterated Ordinary Least Squares (ITOLS)
Two-Stage Least Squares (2SLS)
Iterated Two-Stage Least Squares (IT2SLS)
Three-Stage Least Squares (3SLS)
Iterated Three-Stage Least Squares (IT3SLS)
Seemingly Unrelated Regression (SUR)
Iterated Seemingly Unrelated Regression (ITSUR)
When Estimation Method for GMM or ITGMM check boxes are selected, additional fields are displayed. These fields correspond to options used in the kernel options in MODEL procedure. You can specify kernel options such as Parzen, Bartlett, and quadratic spectral. For more
2948 F Chapter 47: SAS/ETS Model Editor Window Reference
information about KERNEL options, see Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide). Divisor of Variance specifies a degrees-of-freedom correction in estimating the variance matrix. This could be the sum of the weights or the sum of the weights minus the model degrees of freedom.
Set Fit Options for an Optimization This page provides an interface for controlling the objective function minimization process and the integration process. If you encounter difficulty in achieving convergence, then changing the settings of one or more of the process control options might assist in achieving convergence. Except for the Method, the templates are numeric. Type the appropriate value into the text boxes. N OTE : If you have not been given permission to edit the model, you can only browse the minimization and integration process options. Figure 47.14 Set Fit Options for an Optimation
This page has the following controls and fields: Method: options specifies the Gauss method or Marquardt method. Gauss is the default. For the Gauss method, the Gauss-Newton parameter-change vector for a system is computed at the end of each iteration. The objective function is then computed at the changed parameter values at the start of the next iteration. For the Marquardt method, at each iteration, the objective function is evaluated at the parameters changed by the Marquardt-Levenberg parameter-change vector. For information about available methods of parameter estimation, see Chapter 18, “The MODEL Procedure,” (SAS/ETS User’s Guide). Maximum Iterations:
Set Fit Options Page F 2949
specifies the maximum number of Newton iterations performed at each observation and each replication of Monte Carlo simulations. Maximum number of step halvings: specifies the maximum number of subiterations allowed for an iteration. For the Gauss method, this value limits the number of step halvings. For the Marquardt method, this value limits the number of times the step parameter can be increased. The default is MAXSUBITER=30. See Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information. Minimization Tuning specifies the Convergence Criteria and Singularity Criteria.
Set Fit Options for Constraints This page lists any previously defined constraints for the model. Initially, there are no constraints defined. Figure 47.15 Set Fit Options for Constraints
This page has the following controls and fields: Simple Bounds impose simple boundary constraints on the parameter estimates. You can specify any number of statements in this field. To add a new constraint, click the Add button
.
Restrictions impose linear and nonlinear restrictions on the parameter estimates. You can use both Simple Bounds statements and Restrictions statements to impose boundary constraints. However, the Simple Bounds statements provide a simpler syntax for specifying these kinds of constraints.
2950 F Chapter 47: SAS/ETS Model Editor Window Reference
Suppose you want to restrict the parameter that is associated with the first exogenous (independent) variable to be greater than zero (0). To add this parameter restriction to the model template, click the Add button . If the restriction code contains syntax errors, an error dialog box appears that has instructions for correcting the restriction code.
Set Fit Options for Tests This page enables you to perform tests of nonlinear hypotheses on the model parameters. You can define statistical tests that are associated with the model parameters. This page lists any previously defined parameter tests for the model. Initially, there are no tests defined. Figure 47.16 Set Fit Options for Tests
This page has the following controls and fields: Tests perform tests of nonlinear hypotheses on the model parameters. Test expressions can be composed of parameter names, arithmetic operators, functions, and constants. You can specify any number of test statements by typing the test equations in this page. Suppose that you want to test that the parameters that are associated with the first and second exogenous (independent) variables are equal. Click the Add button Expression equation.
to display the Test
In the Label field, type “My test”. In the Expression field, type the following expression: a = b. The type of test can be Wald, Lagrange multiplier (LM), likelihood ratio (LR), or all three (ALL). To
Set Fit Options Page F 2951
add this test, click the Add button . If the test expression contains syntax errors, an error dialog box appears that has instructions about correcting the test expression. See Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information. You can define any number of tests.
Set Fit Options for Outputs Figure 47.17 Set Fit Options for Outputs
This page has the following controls and fields: Output specifies which results to display for your fitted model. Select one or more check boxes (or select All) to indicate which analyses you want to include in the output data set. Select All to display all analyses for each model equation. Select Predicted to show the predicted value of this model. Select Actual to display the actual value of the model. Select Errors to display residual error analysis for each model equation. Select Lags Start to display the lag-starting observations. Estimated Covariance of the Equation Errors displays the cross-equation covariance. In addition, the determinant of the matrices is displayed. Parameter Estimates displays the parameter estimates. You can show the parameter estimated covariance and correlations matrices by selecting Include Covariance Matrix of Estimates. Library Name: is the name of the library where the user-specified output is saved.
2952 F Chapter 47: SAS/ETS Model Editor Window Reference
Dataset Name: is the data set in which you want the user-specified output to be saved. Browse: opens the data set selection window for selecting an output data set.
Output and Reports Various output and report windows are available to display the results that are associated with fitting your model. To view output and reports, click a model name and select one of the following menu items: Model Summary provides summary statistics of the model. Equation Results provides of the estimates of the equation. Time Series displays the plotted series. Covariance Matrix opens the Fitted Model Covariance/Correlation Matrix window in the SAS/ETS Model Editor. It displays the cross-equation and parameter estimate covariance and correlation matrices. Distribution Plot opens the Fitted Model Distribution window in the SAS/ETS Model Editor. Parameter Estimates displays the model parameter estimation results. Residuals Plot displays the Residuals Plot window for the model. Residual Errors displays a list of residual errors analysis for each model equation. Actual v/s Predicted Plot provides the graph that compares actual results to the predict results. See Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information.
Equation Results Window The Fit Results window corresponds to the FIT statement in the SAS/ETS MODEL procedure. This window opens when you click Finish in the Fitted Model wizard, or you can open this window by right-clicking a model in the model tree. In the Fit Results window, you can change model parameters and fit the model again. You can also fit two different models and compare their results. In the Displayed Fit Results list box, select each check box that corresponds to a model that you want to display. On the Full Results tab, a summary of each selected model is displayed in the upper part of the window. To change parameters for a
Equation Results Window F 2953
model, click Edit in the pane that corresponds to that model. The model parameters are displayed in the lower part of the window. For information about how to edit fields there, see the section “Edit Existing Fitted Model” on page 2963. Click the Code tab to display the code for each displayed model. Figure 47.18 Fit Results Comparison
2954 F Chapter 47: SAS/ETS Model Editor Window Reference
Figure 47.19 Edit Fit Results
Time Series Window The Time Series window displays the plotted series, and you can transform and test the series. To open this window, right-click an existing fitted model in the SAS libraries panel and select Time Series or click View Time Series in the Fitted Model wizard page Select the data set to fit. For more information about time series, see Chapter 27, “The TIMESERIES Procedure,” (SAS/ETS User’s Guide).
Graphing Options Window F 2955
Figure 47.20 View Time Series of the Data
This window has the following controls and fields: Graphing Options opens a window that enables you to change graphing options. Series Values displays a plot of the time series of the selected variables. Observation displays a table of all the observations of the data set. OK closes the View Time Series window.
Graphing Options Window When you click Graphing Options in the Time Series window, the Graphing Options window opens.
2956 F Chapter 47: SAS/ETS Model Editor Window Reference
Figure 47.21 Graphing Options Window
This window has the following controls and fields: Plots includes the groups of check boxes for Seasonal Decomposition/Adjustment Plots, Standard Plots, Correlation Plots, and Cross-Variable Plots. TimeSeries Options enables you to select values for simple difference, seasonal difference, cross-variable, decomposition method, functional transformation, and Box-Cox parameter. Tables indicates which tables to display. OK closes the window and returns to the Time Series window. Cancel closes the window without implementing any changes.
Model Summary Results Window The Model Summary Results window displays the estimated results of the model. To open this window, right-click an existing fitted model in the SAS libraries panel and select Model Summary.
Model Summary Results Window F 2957
Figure 47.22 Model Summary Results Window
This window includes the following contents: Model Summary Results for the Fitted Model provides detailed summary statistics of the model. Equations displays the equations to be fitted in this model. Model Variables displays the endogenous (dependent) variables of this model. Parameter Estimates displays the model parameter estimation results. If any parameter restrictions were imposed, the statistical properties that are associated with the restriction are also displayed. Parameter restrictions are defined by using the Set fit options for constraints wizard page or the Model Template Constraints window.
2958 F Chapter 47: SAS/ETS Model Editor Window Reference
Fitted Model Equation Results Window The Fitted Model Equation Results window displays the estimated dependent variable results of the model. To open this window, right-click an existing fitted model in the SAS libraries pane and select Equation Results. Figure 47.23 Fitted Model Equation Results Window
This window includes the following contents: Actual v/s Predicted for the model variables plots the predicted and actual values. Residuals for the model variables plots the residuals of the fitted model. Equations display the equations to be fitted for the model. Parameter Estimates displays the model parameter estimation results. If any parameter restrictions were imposed, the statistical properties that are associated with the restriction are also displayed. Parameter
Fitted Model Distribution Window F 2959
restrictions are defined by using the Set fit options for constraints wizard page or the Model Template Constraints window. Summary of Residual Errors displays the estimates of the residual errors. OK closes the window.
Fitted Model Covariance/Correlation Matrix Window This window displays the cross-equation and parameter estimate covariance and correlation matrices. You can view either the covariance or the correlation matrices. In addition, the determinant of the matrices is displayed. To open this window, right-click an existing fitted model in the SAS libraries panel and select Covariance Matrix. Figure 47.24 Fitted Model Covariance/Correlation Matrix Window
Fitted Model Distribution Window The Fitted Model Distribution window displays the plots of the residual distributions. To open this window, right-click an existing fitted model in the SAS libraries panel and select Distribution Plot. The Fitted Model Distribution dialog box opens. Specify the number of replications, number of steps in time, and the random seed value, and click OK. The Fitted Model Distribution window opens.
2960 F Chapter 47: SAS/ETS Model Editor Window Reference
Figure 47.25 Fitted Model Distribution Dialog Box
Figure 47.26 Fitted Model Distribution Window
Model Parameters Estimates Window The Model Parameters Estimates window enables you to view the model fit estimates. To open this window, right-click an existing fitted model in the SAS libraries panel and select Parameter Estimates.
Residuals Plot Window F 2961
Figure 47.27 Parameter Estimate
If the fit is unsuccessful, an error dialog box appears that contains a brief description of the problem. The details about the fitting errors are shown in the Message or the Log window.
Residuals Plot Window This window opens the output data set to enable you to view and analyze model residuals in graphical plots. To open this window, right-click an existing fitted model in the SAS libraries panel and select Residuals Plot. Figure 47.28 Residuals Plot
2962 F Chapter 47: SAS/ETS Model Editor Window Reference
Model Summary of Residual Errors Window This window displays the residual error analysis for each model equation. To open this window, right-click an existing fitted model in the SAS libraries panel and select Residual Errors. Figure 47.29 Model Summary of Residual Errors Window
Actual v/s Predicted Plot Window This window displays the actual data set and the predicted data set for viewing and analyzing the model prediction power in graphical plots. To open this window, right-click an existing fitted model in the SAS libraries panel and select Actual v/s Predicted Plot. Figure 47.30 Actual v/s Predicted Plot Window
Open Existing Model F 2963
Edit Existing Fitted Model
Open Existing Model In the SAS/ETS Model Editor window, click the Sources tab. In the Display Preference list box, select All Models to display all of the predefined models. If necessary, expand items in the Name box. Right-click the model name and select Open. The model window opens. The title of the window corresponds to the structure where the model is located. The model window contains a pane on the left with values such as Variables, Equations, Constraints, and Fit. The contents of the right pane change depending on the selection made in the left pane. Figure 47.31 Open Existing Model
The model window always contains the following controls and fields: Details: tab displays details for the selection highlighted in the left pane.
2964 F Chapter 47: SAS/ETS Model Editor Window Reference
SAS Code provides the SAS statements for this window. Check Syntax displays any error in the MODEL procedure syntax. OK runs the model and saves the results. Cancel closes the window without implementing any changes.
Fit Model—Equations To select which equations to fit, select Equations under Fit in the left pane of the model window. In the right pane, select an entry from the Equations Available to Fit: list and click the right-arrow button
. The equation is moved to Equations to Fit: list. Figure 47.32 Define Equations
This right pane has the following controls and fields: Equation Available to Fit: list lists equations that are available to be fit. If the list of equations is blank, all model equations that contain parameters are fitted. Equations To Fit: list
Fit Model—Input F 2965
displays the equations selected to be fit. To delete an equation, select an existing equation and then click the Delete Row button
.
Fit Model—Input To specify input data, select Input under Fit in the left pane of the model window. The right pane then displays fields where you can select the library name and data set name, map program symbols to data set variables, and define covariance matrix of equation errors. Figure 47.33 Input Data Set
This right pane has the following controls and fields: Library Name: is the name of the library where the current model is located. Data Set Name: is the name of the input data set against which you want the model fitted. Browse: opens the data set selection window for selecting an input data set. ID: specifies a single variable to identify observations in error messages or in other listings and in the output data set. The ID variables are usually SAS date or datetime variables. You can type any valid SAS variable name in this field or select from the list.
2966 F Chapter 47: SAS/ETS Model Editor Window Reference
Frequency: is the time interval (data frequency) for the input data set. Specify specifies which frequency in the list is to be used in the time series model. When you click Specify, the Specify Frequency window opens. See the section “Specify Frequency Window” on page 2938 for more information. Sort By: indicates how the fit options are sorted. This field has two list boxes. In the left list box, select the variable by which to sort. In the right list box, select Ascending, Descending, or Not Sorted. Weight: specifies a variable to supply a weighting value to use for each observation in estimating parameters. Missing Value Behavior: specifies whether missing values are tracked on an equation-by-equation basis or the entire observation is omitted from the analysis when any equation has a missing predicted or actual value for the equation. Program Symbols: table lists the name, type, and label of the variables in the existing model. The type of the variables include endogenous, exogenous, and variable. Data Set: table lists the name and label of the variables in the data set. You can identify the type of each variable by mapping program symbols to data set variables. New Mapping button creates a new mapping between a model variable selected in the Program Symbols table and a data set variable selected in the Data Set table. This button becomes available only when , then a model variable and a data set variable are selected. To create a mapping, click click an entry in the Program Symbols: table and drag the pointer to an entry in the Data Set: table. Delete button deletes the selected mapping. Evaluate button evaluates data. Automatic Mapping button automatically maps program variables to data set parameters by using their names. Click this button to map variables between the Program Symbols and Data Set tables by their variable names. If the Program Symbols table contains a variable whose name matches a variable in the Data Set table, these variables are automatically mapped to each other. Covariance Matrix of Equation Errors specifies the name of the input data set and the name of the library where it is stroed.
Fit Model—Iteration F 2967
Fit Model—Method To specify how the model parameters are to be estimated, select Method under Fit in the left pane of the model window. For more information about available methods of parameters estimation, see Chapter 18, “The MODEL Procedure,” (SAS/ETS User’s Guide). Figure 47.34 Model Fit Method
This right pane has the following controls and fields: Estimation Methods check boxes enable you to select the corresponding estimation method. The check boxes correspond to the methods implemented in PROC MODEL, all of which aim to minimize an objective function. Divisor of Variance specifies a degrees-of-freedom correction in estimating the variance matrix. This could be the sum of the weights or the sum of the weights minus the model degrees of freedom.
Fit Model—Iteration To specify the parameters for minimizing the objective function, select Iteration under Fit in the left pane of the model window.
2968 F Chapter 47: SAS/ETS Model Editor Window Reference
Figure 47.35 Model Estimation Iteration
This right pane has the following controls and fields: Method: options specifies the Gauss method or Marquardt method. Gauss is the default. For the Gauss method, the Gauss-Newton parameter-change vector for a system is computed at the end of each iteration. The objective function is then computed at the changed parameter values at the start of the next iteration. For the Marquardt method, at each iteration, the objective function is evaluated at the parameters changed by the Marquardt-Levenberg parameter-change vector. For more information about available methods of parameter estimation, see Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide).
Fit Model—Tests F 2969
Maximum Iterations: specifies the maximum number of Newton iterations performed at each observation and each replication of Monte Carlo simulations. Maximum number of step halvings: specifies the maximum number of subiterations allowed for an iteration. For the Gauss method, this value limits the number of step halvings. For the Marquardt method, this value limits the number of times step parameter can be increased. The default is MAXSUBITER=30. See Chapter 18, “The MODEL Procedure” (SAS/ETS User’s Guide), for more information. Minimization Tuning specifies the Convergence Criteria and Singularity Criteria.
Fit Model—Tests The SAS/ETS Model Editor enables you to perform tests of nonlinear hypotheses on the model parameters. To define statistical tests that are associated with the model parameters, select Tests under Fit in the left pane of the model window. Any previously defined parameter tests for the model are listed. Figure 47.36 Fit Tests
This right pane has the following controls and fields:
2970 F Chapter 47: SAS/ETS Model Editor Window Reference
Tests lists tests of nonlinear hypotheses to be performed on the model parameters. Test expressions can be composed of parameter names, arithmetic operators, functions, and constants. To add a test, click the Add button
and enter the parameters for the test. To delete a test, select
the test and then click the Delete button
.
Fit Model—Results To specify which results you want and to save them to an output data set, select Results under Fit in the left pane of the model window. The right pane then provides fields where you can specify which results you want to save and where you can specify the library and data set name where the output is saved. Figure 47.37 Output Selection
This right pane has the following controls and fields: Output specifies the results from your model fitting that are to be displayed. Select the check box for each analysis you want to include in the output data set or select All to display all analyses for each model equation.
Define the Model Parameters and Variables F 2971
Estimated Covariance of the Equation Errors specifies that the cross-equation covariance are to be displayed and also saved in the output data set specified in Data Set Name: in the folder specified in Library Name:. In addition, the determinant of the matrices is displayed. Parameter Estimates specifies that the parameter estimates are to be displayed and also saved in the output data set specified in Data Set Name: in the folder specified in Library Name:. Select Include Covariance Matrix of Estimates to also display and save the estimated covariance and correlations matrix. Library Name: is the name of the library where the model fit results are to be saved. Data Set Name: is the data set in which you want the model fit results to be saved . Browse: opens the data set selection window for selecting an output data set.
Define the Model Parameters and Variables To select input variables for use by the SAS/ETS Model Editor and add them to the Variable and Parameter Definitions list, select Parameters and Variables in the left pane of the model window. Figure 47.38 Create New Model Template—Equations
2972 F Chapter 47: SAS/ETS Model Editor Window Reference
Figure 47.38 continued
The right pane has the following controls and fields: Equation: enable you to enter the equations for the model in the Equation editor. Variable and Parameter Definitions displays the variable and parameter definitions for the equations for the model that are shown in the Equation field. To add a new row, click the Add button .A new entry appears in the Variable and Parameter Definitions table. To edit the values for each entry, double-click the appropriate cell in the Variable and Parameter Definitions table. To delete an entry, select an existing entry and then click the Delete button
.
Name column specifies the name of the variable or parameter. Type column specifies the variable type. To change the type, click the down arrow from the resulting list.
and then select
Label column specifies the label of the variable or parameter. Instrument? column specifies whether the variable or parameter is an instrument. To change the value, click the down arrow
and select from the resulting list.
Model Equations F 2973
Output To Data Set column specifies the set of data set options for displaying output and saving it to a data set. To change the value, click the down arrow
and select from the resulting list.
All dependent variables are instruments. specifies whether all dependent variables are instruments. Instruments Only Use Intercept. specifies whether instruments only use an intercept. Instruments Includes Intercept. specifies whether the instruments include an intercept. ID specifies the ID variable. To change the value, click the down arrow resulting list.
and select from the
Range: specifies the time range between observations in the data set. Select a range from list. Start: specifies the starting date for the time series in the data set. Enter a date value in this field, using a form that is recognizable as a SAS date informat (for example, 1998:1, feb1997, or 03mar1998). End: specifies the ending date for the time series in the data set. Enter a date value in this field, using a form that is recognizable as a SAS date informat (for example, 1998:1, feb1997, or 03mar1998).
Model Equations To specify equations to use in the model, select Equations in the left pane of the model window.
2974 F Chapter 47: SAS/ETS Model Editor Window Reference
Figure 47.39 Create New Model Template—Equations
This top of the right pane has the following controls and fields: Equation: enables you to enter the equations for the model.
Define the Model Parameters, Variables, and Equations To select input variables for use by the SAS/ETS Model Editor and add them to the Variable and Parameter Definitions list, select Parameters and Variables in the left pane of the model window. To specify equations to use in the model, select Equations in the left pane. The right pane has the following controls and fields: Equation: enables you to enter the equations for the model. Variable and Parameter Definitions displays the variable and parameter definitions for the equations for the model that are shown in the Equation field. To add a new row, click the Add button . A new entry appears in the Variable and Parameter Definitions table. To edit the values for each entry, double-click the appropriate cell in the Variable and Parameter Definitions table. To delete an entry, select an existing entry and then click the Delete button
.
Model Constraints F 2975
Name column specifies the name of the variable or parameter. Type column specifies the variable type. To change the type, click the down arrow from the resulting list.
and then select
Label column specifies the label of the variable or parameter. Instrument? column specifies whether the variable or parameter is an instrument. To change the value, click the down arrow
and select from the resulting list.
Output To Data Set column specifies the set of data set options for displaying output and saving it to a data set. To change the value, click the down arrow
and select from the resulting list.
All dependent variables are instruments. specifies whether all dependent variables are instruments. Instruments Only Use Intercept. specifies whether instruments only use an intercept. Instruments Includes Intercept. specifies whether the instruments include an intercept. ID specifies the ID variable. To change the value, click the down arrow resulting list.
and select from the
Range: specifies the time range between observations in the data set. Select a range from list. Start: specifies the starting date for the time series in the data set. Enter a date value in this field, using a form that is recognizable as a SAS date informat (for example, 1998:1, feb1997, or 03mar1998). End: specifies the ending date for the time series in the data set. Enter a date value in this field, using a form that is recognizable as a SAS date informat (for example, 1998:1, feb1997, or 03mar1998).
Model Constraints To specify the boundary constraints and linear and nonlinear restrictions for the parameter estimates, select Contraints in the left pane of the model window.
2976 F Chapter 47: SAS/ETS Model Editor Window Reference
Figure 47.40 Create New Model Template—Constraints
The right pane has the following controls and fields: Simple Bounds specifies the boundary constraints. To add a constraint, click the Add button entry appears in the Simples Bounds table.
. A new
To edit an entry, double-click the entry. To delete an entry, select it and then click the Delete button
.
Restrictions specifies the linear and nonlinear restrictions. To add a restriction, click the Add button A new entry appears in the Restrictions table.
.
To edit an entry, double-click the entry. To delete an entry, select it and then click the Delete button
.
Part VI
Investment Analysis
2978
Chapter 48
Overview Contents About Investment Analysis . Starting Investment Analysis Getting Help . . . . . . . . Using Help . . . . . . . . . Software Requirements . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. 2979 . 2980 . . 2981 . . 2981 . 2982
About Investment Analysis The Investment Analysis system is an interactive environment for the time-value of money of a variety of investments: loans savings depreciations bonds generic cashflows Various analyses are provided to help analyze the value of investment alternatives: time value, periodic equivalent, internal rate of return, benefit-cost ratio, and breakeven analysis. These analyses can help answer a number of questions you may have about your investments: Which option is more profitable or less costly? Is it better to buy or rent? Are the extra fees for refinancing at a lower interest rate justified? What is the balance of this account after saving this amount periodically for so many years? How much is legally tax-deductible?
2980 F Chapter 48: Overview
Is this a reasonable price? Investment Analysis can be beneficial to users in many industries for a variety of decisions: manufacturing: cost justification of automation or any capital investment, replacement analysis of major equipment, or economic comparison of alternative designs government: setting funds for services finance: investment analysis and portfolio management for fixed-income securities
Starting Investment Analysis There are two ways to invoke Investment Analysis from the main SAS window. One way is to select Solutions ! Analysis ! Investment Analysis from the main SAS menu, as displayed in Figure 48.1. Figure 48.1 Initializing Investment Analysis with the Menu Bar
The other way is to type INVEST into the toolbar’s command prompt, as displayed in Figure 48.2. Figure 48.2 Initializing Investment Analysis with the Toolbar
Using Help F 2981
Getting Help You can get help in Investment Analysis in three ways. One way is to use the Help Menu, as displayed in Figure 48.3. This is the right-most menu item on the menu bar. Figure 48.3 The Help Menu
Help buttons, as in Figure 48.4, provide another way to access help. Most dialog boxes provide help buttons in their lower-right corners. Figure 48.4 A Help Button
Also, the toolbar has a button (see Figure 48.5) that invokes the help system. This is the right-most icon on the toolbar. Figure 48.5 The Help Icon
Each of these methods invokes a browser that gives specific help for the active window.
Using Help The chapters pertaining to Investment Analysis in this document typically have a section that introduces you to a menu and summarizes the options available through the menu. Such chapters then have sections titled Task and Dialog Box Guides. The Task section provides a description of how to perform many useful tasks. The Dialog Box Guide lists all dialog boxes pertinent to those tasks and gives a brief description of each element of each dialog box.
2982 F Chapter 48: Overview
Software Requirements Investment Analysis uses the following SAS software: Base SAS SAS/ETS SAS/GRAPH (optional, to view bond pricing and breakeven graphs)
Chapter 49
Portfolios Contents The File Menu . . . . . . . . . . . . . . . . . Tasks . . . . . . . . . . . . . . . . . . . . . . Creating a New Portfolio . . . . . . . . . Saving a Portfolio . . . . . . . . . . . . Opening an Existing Portfolio . . . . . . Saving a Portfolio to a Different Name . Selecting Investments within a Portfolio . Dialog and Utility Guide . . . . . . . . . . . . Investment Analysis . . . . . . . . . . . Menu Bar Options . . . . . . . . . . . . Right-Clicking within the Portfolio Area
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. 2983 . 2984 . 2984 . 2985 . 2985 . 2986 . . 2987 . 2988 . 2988 . 2989 . 2989
The File Menu Investment Analysis stores portfolios as catalog entries. Portfolios contain a collection of investments, providing a structure to collect investments with a common purpose or goal (like a retirement or building fund portfolio). It may be advantageous also to collect investments into a common portfolio if they are competing investments you want to perform a comparative analysis upon. Within this structure you can perform computations and analyses on a collection of investments in a portfolio, just as you would perform them on a single investment. Investment Analysis provides many tools to aid in your manipulation of portfolios through the File menu, shown in Figure 49.1.
2984 F Chapter 49: Portfolios
Figure 49.1 File Menu
The File menu offers the following items: New Portfolio creates an empty portfolio with a new name. Open Portfolio opens the standard SAS Open dialog box where you select a portfolio to open. Save Portfolio saves the current portfolio to its current name. Save Portfolio As opens the standard SAS Save As dialog box where you supply a new portfolio name for the current portfolio. Close closes Investment Analysis. Exit closes SAS (Windows only).
Tasks
Creating a New Portfolio From the Investment Analysis dialog box, select File ! New Portfolio.
Saving a Portfolio F 2985
Figure 49.2 Creating a New Portfolio
The Portfolio Name is WORK.INVEST.INVEST1 as displayed in Figure 49.2, unless you have saved a portfolio to that name in the past. In that case, some other unused portfolio name is given to the new portfolio.
Saving a Portfolio From the Investment Analysis dialog box, select File ! Save Portfolio. The portfolio is saved to a catalog-entry with the name in the Portfolio Name box.
Opening an Existing Portfolio From the Investment Analysis dialog box, select File ! Open Portfolio. This opens the standard SAS Open dialog box. You enter the name of a SAS portfolio to open in the Entry Name box. For example, enter SASHELP.INVSAMP.NVST as displayed in Figure 49.3.
2986 F Chapter 49: Portfolios
Figure 49.3 Opening an Existing Portfolio
Click Open to load the portfolio. The portfolio should look like Figure 49.4. Figure 49.4 The Opened Portfolio
Saving a Portfolio to a Different Name From the Investment Analysis dialog box, select File ! Save Portfolio As. This opens the standard SAS Save As dialog box. You can enter the name of a SAS portfolio into the Entry Name box. For example, enter SASUSER.MY_PORTS.PORT1, as in Figure 49.5.
Selecting Investments within a Portfolio F 2987
Figure 49.5 Saving a Portfolio to a Different Name
Click Save to save the portfolio.
Selecting Investments within a Portfolio To select a single investment in an opened portfolio, click the investment in the Portfolio area within the Investment Analysis dialog box. To select a list of adjacent investments, do the following: click the first investment, hold down SHIFT, and click the final investment. After the list of investments is selected, you can release the SHIFT key. The selected investments will appear highlighted as in Figure 49.6. Figure 49.6 Selecting Investments within a Portfolio
2988 F Chapter 49: Portfolios
Dialog and Utility Guide
Investment Analysis Figure 49.7 Investment Analysis Dialog Box
Investment Portfolio Name holds the name of the portfolio. The name is of the form library.catalog_entry.portfolio. The default portfolio name is work.invest.invest1, as in Figure 49.7. Portfolio Description provides a more descriptive explanation of the portfolio’s contents. You can edit this description any time this dialog box is active. The Portfolio area contains the list of investments comprising the particular portfolio. Each investment in the Portfolio area displays the following attributes: Name is the name of the investment. It must be a valid SAS name. It is used to distinguish investments when performing analyses and computations. Label is a place where you can provide a more descriptive explanation of the investment. Type is the type of investment, which is fixed when you create the investment. It is one of the following: LOAN, SAVINGS, DEPRECIATION, BOND, or OTHER. Additional tools to aid in the management of your portfolio are available by selecting from the menu bar or by right-clicking within the Portfolio area.
Right-Clicking within the Portfolio Area F 2989
Menu Bar Options Figure 49.8 The Menu Bar
The menu bar (shown in Figure 49.8) provides many tools to aid in the management of portfolios and the investments that comprise them. The following menu items provide functionality particular to Investment Analysis: File opens and saves portfolios. Investment creates new investments within the portfolio. Compute performs constant dollar, after tax, and currency conversion computations on generic cashflows. Analyze analyzes investments to aid in decision-making. Tools sets default values of inflation and income tax rates.
Right-Clicking within the Portfolio Area Figure 49.9 Right-Clicking
After selecting an investment, right-clicking in the Portfolio area pops up a menu (see Figure 49.9) that offers the following options: Edit opens the selected investment within the portfolio. Duplicate creates a duplicate of the selected investment within the portfolio. Delete removes the selected investment from the portfolio. If you wish to perform one of these actions on a collection of investments, you must select a collection of investments (as described in the section “Selecting Investments within a Portfolio” on page 2987) before right-clicking.
2990
Chapter 50
Investments Contents The Investment Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loan Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specifying Savings Terms to Create an Account Summary . . . . . . . Depreciation Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bond Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generic Cashflow Tasks . . . . . . . . . . . . . . . . . . . . . . . . . Dialog Box Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loan Initialization Options . . . . . . . . . . . . . . . . . . . . . . . . Loan Prepayments . . . . . . . . . . . . . . . . . . . . . . . . . . . . Balloon Payments . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rate Adjustment Terms . . . . . . . . . . . . . . . . . . . . . . . . . Rounding Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Depreciation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Depreciation Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bond Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bond Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generic Cashflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . Right-Clicking within Generic Cashflow’s Cashflow Specification Area Flow Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forecast Specification . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. 2991 2993 2993 3000 . 3001 3004 3008 3015 3015 . 3017 3018 3019 3019 . 3021 . 3021 3023 3025 3025 . 3027 3028 3029 3030 . 3031 3033
The Investment Menu Because there are many types of investments, a tool that manages and analyzes collections of investments must be robust and flexible. Providing specifications for four specific investment types and one generic type, Investment Analysis can model almost any real-world investment.
2992 F Chapter 50: Investments
Figure 50.1 Investment Menu
The Investment menu, shown in Figure 50.1, offers the following items: New ! Loan opens the Loan dialog box. Loans are useful for acquiring capital to pursue various interests. Available terms include rate adjustments for variable rate loans, initialization costs, prepayments, and balloon payments. New ! Savings opens the Savings dialog box. Savings are necessary when planning for the future, whether for business or personal purposes. Account summary calculations available per deposit include starting balance, deposits, interest earned, and ending balance. New ! Depreciation opens the Depreciation dialog box. Depreciations are relevant in tax calculation. The available depreciation methods are Straight Line, Sum-of-years Digits, Depreciation Table, and Declining Balance. Depreciation Tables are necessary when depreciation calculations must conform to set yearly percentages. Declining Balance with conversion to Straight Line is also provided. New ! Bond opens the Bond dialog box. Bonds have widely varying terms depending on the issuer. Because bond issuers frequently auction their bonds, the ability to price a bond between the issue date and maturity date is desirable. Fixed-coupon bonds may be analyzed for the following: price versus yield-to-maturity, duration, and convexity. These are available at different times in the bond’s life. New ! Generic Cashflow opens the Generic Cashflow dialog box. Generic cashflows are the most flexible investments. Only a sequence of date-amount pairs is necessary for specification. You can enter date-amount pairs and load values from SAS data sets to specify any type of investment. You can generate uniform, arithmetic, and geometric cashflows with ease. SAS’s forecasting ability is available to forecast future cashflows as well. The new graphical display aids in visualization of the cashflow and enables the user to change the frequency of the cashflow view to aggregate and disaggregate the view. Edit opens the specification dialog box for an investment selected within the portfolio. Duplicate creates a duplicate of an investment selected within the portfolio. Delete removes an investment selected from the portfolio. If you want to edit, duplicate, or delete a collection of investments, you must select a collection of investments as described in the section “Selecting Investments within a Portfolio” on page 2987 before performing the menu-option.
Tasks F 2993
Tasks
Loan Tasks Suppose you want to buy a home that costs $100,000. You can make a down payment of $20,000. Hence, you need a loan of $80,000. You are able to acquire a 30-year loan at 7% interest starting January 1, 2000. Let’s use Investment Analysis to specify and analyze this loan. In the Investment Analysis dialog box, select Investment ! New ! Loan from the menu bar to open the Loan dialog box.
Specifying Loan Terms to Create an Amortization Schedule You must specify the loan before generating the amortization table. To specify the loan, follow these steps: 1. Enter MORTGAGE for the Name. 2. Enter 80000 for the Loan Amount. 3. Enter 7 for the Initial Rate. 4. Enter 360 for the Number of Payments. 5. Enter 01JAN2000 for the Start Date. After you have specified the loan, click Create Amortization Schedule to generate the amortization schedule displayed in Figure 50.2.
2994 F Chapter 50: Investments
Figure 50.2 Creating an Amortization Schedule
Storing Other Loan Terms Let’s include information concerning the purchase price and downpayment. These terms are not necessary to specify the loan, but it may be advantageous to store such information with the loan. Consider the loan described in the section “Loan Tasks” on page 2993. In the Loan dialog box (Figure 50.2) click Initialization to open the Loan Initialization Options dialog box, where you can specify the down payment, initialization costs, and discount points. To specify the down payment, enter 100000 for the Purchase Price, as shown in Figure 50.3.
Loan Tasks F 2995
Figure 50.3 Including the Purchase Price
Click OK to return to the Loan dialog box.
Adding Prepayments Now let’s observe the effect of prepayments on the loan. Consider the loan described in the section “Loan Tasks” on page 2993. You must pay a minimum of $532.24 each month to keep up with payments. However, let’s say you dislike entering this amount in your checkbook. You would rather pay $550.00 to keep the arithmetic simpler. This would constitute a uniform prepayment of $17.76 each month. In the Loan dialog box, click Prepayments, which opens the Loan Prepayments dialog box shown in Figure 50.4.
2996 F Chapter 50: Investments
Figure 50.4 Specifying the Loan Prepayments
You can specify an arbitrary sequence of prepayments in the Prepayments area. If you want a uniform prepayment, clear the Prepayments area and enter the uniform payment amount in the Uniform Prepayment text box. That amount will be added to each payment until the loan is paid off. To specify this uniform prepayment, follow these steps: 1. Enter 17.76 for the Uniform Prepayment. 2. Click OK to return to the Loan dialog box. 3. Click Create Amortization Schedule, and the amortization schedule is updated, as displayed in Figure 50.5.
Loan Tasks F 2997
Figure 50.5 The Amortization Schedule with Loan Prepayments
The last payment is on January 2030 without prepayments and February 2027 with prepayment; you would pay the loan off almost three years earlier with the $17.76 prepayments. To continue this example you must remove the prepayments from the loan specification, following these steps: 1. Reopen the Loan Prepayments dialog box from the Loan dialog box by clicking Prepayments. 2. Enter 0 for Uniform Prepayment. 3. Click OK to return to the Loan dialog box.
Adding Balloon Payments Consider the loan described in the section “Loan Tasks” on page 2993. Suppose you cannot afford the payments of $532.24 each month. To lessen your monthly payment, you could pay balloon payments of $6,000 at the end of 2007 and 2023. You wonder how this would affect your monthly payment. (Note that Investment Analysis does not allow both balloon payments and rate adjustments to be specified for a loan.) In the Loan dialog box, click Balloon Payments, which opens the Balloon Payments dialog box shown in Figure 50.6.
2998 F Chapter 50: Investments
Figure 50.6 Defining Loan Balloon Payments
You can specify an arbitrary sequence of balloon payments by adding date-amount pairs to the Balloon Payments area. To specify these balloon payments, follow these steps: 1. Right-click within the Balloon Payment area (which pops up a menu) and release on New. 2. Set the pair’s Date to 01JAN2007. 3. Set Amount to 6000. 4. Right-click within the Balloon Payment area (which pops up a menu) and release on New. 5. Set the new pair’s Date to 01JAN2023. 6. Set its Amount to 6000. Click OK to return to the Loan dialog box. Click Create Amortization Schedule, and the amortization schedule is updated. Your monthly payment is now $500.30, a difference of approximately $32 each month. To continue this example you must remove the balloon payments from the loan specification, following these steps: 1. Reopen the Balloon Payments dialog box. 2. Right-click within the Balloon Payment area (which pops up a menu) and release on Clear. 3. Click OK to return to the Loan dialog box.
Loan Tasks F 2999
Handling Rate Adjustments Consider the loan described in the section “Loan Tasks” on page 2993. Another option for lowering your payments is to get a variable rate loan. You can acquire a three-year adjustable rate mortgage (ARM) at 6% with a periodic cap of 1% with a maximum of 9%. (Note that Investment Analysis does not allow both rate adjustments and balloon payments to be specified for a loan.) In the Loan dialog box, click Rate Adjustments to open the Rate Adjustment Terms dialog box shown in Figure 50.7. Figure 50.7 Setting the Rate Adjustments
To specify these loan adjustment terms, follow these steps: 1. Enter 3 for the Life Cap. The Life Cap is the maximum deviation from the Initial Rate. 2. Enter 1 for the Periodic Cap. 3. Enter 36 for the Adjustment Frequency. 4. Confirm that Worst Case is selected from the Rate Adjustment Assumption options. 5. Click OK to return to the Loan dialog box. 6. Enter 6 for the Initial Rate. 7. Click Create Amortization Schedule, and the amortization schedule is updated. Your monthly payment drops to $479.64 each month. However, if the worst-case scenario plays out, the payments will increase to $636.84 in nine years. Figure 50.8 displays amortization table information for the final few months under this scenario.
3000 F Chapter 50: Investments
Figure 50.8 The Amortization Schedule with Rate Adjustments
Click OK to return to the Investment Analysis dialog box.
Specifying Savings Terms to Create an Account Summary Suppose you put $500 each month into an account that earns 6% interest for 20 years. What is the balance of the account after those 20 years? In the Investment Analysis dialog box, select Investment ! New ! Savings from the menu bar to open the Savings dialog box. To specify the savings, follow these steps: 1. Enter RETIREMENT for the Name. 2. Enter 500 for the Periodic Deposit. 3. Enter 240 for the Number of Deposits. 4. Enter 6 for the Initial Rate.
Depreciation Tasks F 3001
You must specify the savings before generating the account summary. After you have specified the savings, click Create Account Summary to compute the ending date and balance and to generate the account summary displayed in Figure 50.9. Figure 50.9 Creating an Account Summary
Click OK to return to the Investment Analysis dialog box.
Depreciation Tasks Commercial assets are considered to lose value as time passes. For tax purposes, you want to quantify this loss. This investment structure helps calculate appropriate values. Suppose you spend $50,000 for a commercial fishing boat that is considered to have a ten-year useful life. How would you depreciate it? In the Investment Analysis dialog box, select Investment ! New ! Depreciation from the menu bar to open the Depreciation dialog box.
Specifying Depreciation Terms to Create a Depreciation Table To specify the depreciation, follow these steps: 1. Enter FISHING_BOAT for the Name.
3002 F Chapter 50: Investments
2. Enter 50000 for the Cost. 3. Enter 2000 for the Year of Purchase. 4. Enter 10 for the Useful Life. 5. Enter 0 for the Salvage Value. You must specify the depreciation before generating the depreciation schedule. After you have specified the depreciation, click Create Depreciation Schedule to generate a depreciation schedule like the one displayed in Figure 50.10. Figure 50.10 Creating a Depreciation Schedule
The default depreciation method is Declining Balance (with Conversion to Straight Line). Try the following methods to see how they each affect the schedule: Straight Line Sum-of-years Digits Declining Balance (without conversion to Straight Line) It might be useful to compare the value of the boat at 5 years for each method. A description of these methods is available in the section “Depreciation Methods” on page 3067.
Depreciation Tasks F 3003
Using the Depreciation Table Sometimes you want to force the depreciation rates to be certain percentages each year. This option is particularly useful for calculating modified accelerated cost recovery system (MACRS) depreciations. The United States’ Tax Reform Act of 1986 set depreciation rates for an asset based on an assumed lifetime for that asset. Since these lists of rates are important to many people, Investment Analysis provides SAS data sets for situations with yearly rates (using the “half-year convention”). Find them at SASHELP.MACRS* where * refers to the class of the property. For example, use SASHELP.MACRS15 for a fifteen-year property. (When using the MACRS with the Tax Reform Act tables, you must set the Salvage Value to zero.) Suppose you want to compute the depreciation schedule for the commercial fishing boat described in the section “Depreciation Tasks” on page 3001. The boat is a ten-year property according to the Tax Reform Act of 1986. To employ the MACRS depreciation from the Depreciation dialog box, follow these steps: 1. Select the Depreciation Table option within the Depreciation Method area and click OK. This opens the Depreciation Table dialog box. 2. Right-click within the Depreciation area (which pops up a menu) and select Load. 3. Enter SASHELP.MACRS10 for the Dataset Name. The dialog box should look like Figure 50.11. Figure 50.11 MACRS Percentages for a Ten-Year Property
Click OK to return to the Depreciation dialog box. Click Create Depreciation Schedule and the depreciation schedule fills (see Figure 50.12).
3004 F Chapter 50: Investments
Note there are eleven entries in this depreciation schedule. This is because of the half-year convention that enables you to deduct one half of a year the first year which leaves a half year to deduct after the useful life is over. Figure 50.12 Depreciation Table with MACRS10
Click OK to return to the Investment Analysis dialog box.
Bond Tasks Suppose someone offers to sell you a 20-year utility bond that was issued six years ago. It has a $1,000 face value and pays semi-year coupons at 2%. You can purchase it for $780. Would you be satisfied with this bond if you expect an 8% minimum attractive rate of return (MARR)? In the Investment Analysis dialog box, select Investment ! New ! Bond from the menu bar to open the Bond dialog box.
Specifying Bond Terms To specify the bond, follow these steps: 1. Enter UTILITY_BOND for the Name.
Bond Tasks F 3005
2. Enter 1000 for the Face Value. 3. Enter 2 for the Coupon Rate. The Coupon Payment updates to 20. 4. Select SEMIYEAR for Coupon Interval. 5. Enter 28 for the Number of Coupons. Because 14 years remain before the bond matures, the bond still has 28 semiyear coupons to pay. The Maturity Date updates.
Computing the Price from Yield Enter 8 for Yield within the Valuation area. You see the bond’s value would be $666.72 as in Figure 50.13. Figure 50.13 Bond Value
Computing the Yield from Price Now enter 780 for Value within the Valuation area. You see the yield is only 6.5%, as in Figure 50.14. This is not acceptable if you desire an 8% MARR.
3006 F Chapter 50: Investments
Figure 50.14 Bond Yield
Performing Bond Analysis To perform bond-pricing analysis, follow these steps: 1. Click Analyze to open the Bond Analysis dialog box. 2. Enter 8.0 as the Yield to Maturity. 3. Enter 4.0 as the +/-. 4. Enter 0.5 as the Increment by. 5. Enter 780 as the Reference Price. 6. Click Create Bond Valuation Summary. The Bond Valuation Summary area fills and shows you the different values for various yields as in Figure 50.15.
Bond Tasks F 3007
Figure 50.15 Bond Price Analysis
Creating a Price versus Yield-to-Maturity Graph Click Graphics to open the Bond Price dialog box. This contains the price versus yield-to-maturity graph shown in Figure 50.16.
3008 F Chapter 50: Investments
Figure 50.16 Bond Price Graph
Click Return to return to the Bond Analysis dialog box. In the Bond Analysis dialog box, click OK to return to the Bond dialog box. In the Bond dialog box, click OK to return to the Investment Analysis dialog box.
Generic Cashflow Tasks To specify a generic cashflow, you merely define any sequence of date-amount pairs. The flexibility of generic cashflows enables the user to represent economic alternatives or investments that do not fit into loan, savings, depreciation, or bond specifications. In the Investment Analysis dialog box, select Investment ! New ! Generic Cashflow from the menu bar to open the Generic Cashflow dialog box. Enter RETAIL for the Name as in Figure 50.17.
Generic Cashflow Tasks F 3009
Figure 50.17 Introducing the Generic Cashflow
Right-Clicking within the Cashflow Specification Area Right-clicking within Generic Cashflow’s Cashflow Specification area reveals the pop-up menu displayed in Figure 50.18. The menu provides many useful tools to assist you in creating these date-amount pairs. Figure 50.18 Right-Clicking within the Cashflow Specification Area
The following sections describe how to use most of these right-click options. The Specify and Forecast menu items are described in the sections “Including a Generated Cashflow” on page 3011 and “Including a Forecasted Cashflow” on page 3013.
3010 F Chapter 50: Investments
Adding a New Date-Amount Pair
To add a new date-amount pair manually, follow these steps: 1. Right-click in the Cashflow Specification area as shown in Figure 50.18, and release on Add. 2. Enter 01JAN01 for the date. 3. Enter 100 for the amount.
Copying a Date-Amount Pair
To copy a selected date-amount pair, follow these steps: 1. Select the pair you just created. 2. Right-click in the Cashflow Specification area as shown in Figure 50.18, but this time release on Copy.
Sorting All of the Date-Amount Pairs
Change the second date to 01JAN00. Now the dates are unsorted. Cashflow Specification area as shown in Figure 50.18, and release on Sort.
Right-click in the
Deleting a Date-Amount Pair
To delete a selected date-amount pair, follow these steps: 1. Select a date-amount pair. 2. Right-click in the Cashflow Specification area as shown in Figure 50.18, and release on Delete.
Clearing All of the Date-Amount Pairs
To clear all date-amount pairs, right-click in the Cashflow Specification area as shown in Figure 50.18, and release on Clear.
Loading Date-Amount Pairs from a Data Set
To load date-amount pairs from a SAS data set into the Cashflow Specification area, follow these steps: 1. Right-click in the Cashflow Specification area, and release on Load. This opens the Load Dataset dialog box. 2. Enter SASHELP.RETAIL for Dataset Name.
Generic Cashflow Tasks F 3011
3. Click OK to return to the Generic Cashflow dialog box. If there is a Date variable in the SAS data set, Investment Analysis loads it into the list. If there is no date-time-formatted variable, it loads the first available date or date-time-formatted variable. Investment Analysis then searches the SAS data set for an Amount variable to use. If none exists, it takes the first numeric variable that is not used by the Date variable.
Saving Date-Amount Pairs to a Data Set
To save date-amount pairs from the Cashflow Specification area to a SAS data set, follow these steps: 1. Right-click in the Cashflow Specification area and release on Save. Save Dataset dialog box.
This opens the
2. Enter the name of the SAS data set for Dataset Name. 3. Click OK to return to the Generic Cashflow dialog box.
Including a Generated Cashflow To generate date-amount pairs for the Cashflow Specification area, follow these steps: 1. Right-click in the Cashflow Specification area and release on Specify. This opens the Flow Specification dialog box. 2. Select YEAR for the Time Interval. 3. Enter today’s date for the Starting Date. 4. Enter 10 for the Number of Periods. The Ending Date updates. 5. Enter 100 for the level. You can visualize the specification in the Cashflow Chart area (see Figure 50.19). 6. Click Add to add the specified cashflow to the list in the Generic Cashflow dialog box. Clicking Add also returns you to the Generic Cashflow dialog box.
3012 F Chapter 50: Investments
Figure 50.19 Uniform Cashflow Specification
Clicking Subtract subtracts the current cashflow from the Generic Cashflow dialog box, and it returns you to the Generic Cashflow dialog box. You can generate arithmetic and geometric specifications by clicking them within the Series Flow Type area. However, you must enter a value for the Gradient. In both cases the Level value is the value of the list at the Starting Date. With an arithmetic flow type, entries increment by the value Gradient for each Time Interval. With a geometric flow type, entries increase by the factor Gradient for each Time Interval. Figure 50.20 displays an arithmetic cashflow with a Level of 100 and a Gradient of 12.
Generic Cashflow Tasks F 3013
Figure 50.20 Arithmetic Cashflow Specification
Including a Forecasted Cashflow To generate date-amount pairs for the Cashflow Specification area, follow these steps: 1. Right-click in the Cashflow Specification area and release on Forecast to open the Forecast Specification dialog box. 2. Enter sashelp.retail as the Data Set. 3. Select SALES for the Analysis Variable. 4. Click Compute Forecast to generate the forecast. You can visualize the forecast in the Cashflow Chart area (see Figure 50.21). 5. Click Add to add the forecast to the list in the Generic Cashflow dialog box. Clicking Add also returns you to the Generic Cashflow dialog box.
3014 F Chapter 50: Investments
Figure 50.21 Cashflow Forecast
Clicking Subtract subtracts the current forecast from the Generic Cashflow dialog box, and it returns you to the Generic Cashflow dialog box. To review the values from the SAS data set you forecast, click View Table or View Graph. You can adjust the following values for the SAS data set you forecast: Time ID Variable, Time Interval, and Analysis Variable. You can adjust the following values for the forecast: the Horizon, the Confidence, and choice of predicted value, lower confidence limit, and upper confidence limit.
Using the Cashflow Chart Three dialog boxes contain the Cashflow Chart to aide in your visualization of cashflows: Generic Cashflow, Flow Specification, and Forecast Specification. Within this chart, you possess the following tools: You can click on a bar in the plot and view its Cashflow Date and Cashflow Amount. You can change the aggregation period of the view with the box in the lower-left corner of the Cashflow Chart. You can take the quarterly sales figures from the previous example, select YEAR as the value for this box, and view the annual sales figures. You can change the number in the box to the
Loan F 3015
right of the horizontal scroll bar to alter the number of entries you want to view. The number in that box must be no greater than the number of entries in the cashflow list. Lessening this number has the effect of zooming in upon a portion of the cashflow. When the number is less than the number of entries in the cashflow list, you can use the scroll bar at the bottom of the chart to scroll through the chart.
Dialog Box Guide
Loan Selecting Investment ! New ! Loan from the Investment Analysis dialog box’s menu bar opens the Loan dialog box displayed in Figure 50.22. Figure 50.22 Loan Dialog Box
The following items are displayed: Name holds the name you assign to the loan. You can set the name here or within the Portfolio area of the Investment Analysis dialog box. This must be a valid SAS name. The Loan Specification area gives access to the values that define the loan. Loan Amount holds the borrowed amount. Periodic Payment holds the value of the periodic payments.
3016 F Chapter 50: Investments
Number of Payments holds the number of payments in loan terms. Payment Interval holds the frequency of the Periodic Payment. Compounding Interval holds the compounding frequency. Initial Rate holds the interest rate (a nominal percentage between 0 and 120) you pay on the loan. Start Date holds the SAS date when the loan is initialized. The first payment is due one Payment Interval after this time. Initialization opens the Loan Initialization Options dialog box where you can define initialization costs and down-payments relevant to the loan. Prepayments opens the Loan Prepayments dialog box where you can specify the SAS dates and amounts of any prepayments. Balloon Payments opens the Balloon Payments dialog box where you can specify the SAS dates and amounts of any balloon payments. Rate Adjustments opens the Rate Adjustment Terms dialog box where you can specify terms for a variable-rate loan. Rounding Off opens the Rounding Off dialog box where you can select the number of decimal places for calculations. Create Amortization Schedule becomes available when you adequately define the loan within the Loan Specification area. Clicking it generates the amortization schedule. Amortization Schedule fills when you click Create Amortization Schedule. The schedule contains a row for the loan’s start-date and each payment-date with information about the following: Date is a SAS date, either the loan’s start-date or a payment-date. Beginning Principal Amount is the balance at that date. Periodic Payment Amount is the expected payment at that date. Interest Payment is zero for the loan’s start-date; otherwise it holds the interest since the previous date. Principal Repayment is the amount of the payment that went toward the principal. Ending Principal is the balance at the end of the payment interval. Print becomes available when you generate the amortization schedule. Clicking it sends the contents of the amortization schedule to the SAS session print device. Save Data As becomes available when you generate the amortization schedule. Clicking it opens the Save Output Dataset dialog box where you can save the amortization table (or portions thereof) as a SAS Dataset. OK returns you to the Investment Analysis dialog box. If this is a new loan specification, clicking OK appends the current loan specification to the portfolio. If this is an existing loan specification, clicking OK returns the altered loan specification to the portfolio. Cancel returns you to the Investment Analysis dialog box. If this is a new loan specification, clicking Cancel discards the current loan specification. If this is an existing loan specification, clicking Cancel discards the current changes.
Loan Initialization Options F 3017
Loan Initialization Options Clicking Initialization in the Loan dialog box opens the Loan Initialization Options dialog box displayed in Figure 50.23. Figure 50.23 Loan Initialization Options Dialog Box
The following items are displayed: The Price, Loan Amount and Downpayment area contains the following information: Purchase Price holds the actual price of the asset. This value equals the loan amount plus the downpayment. Loan Amount holds the loan amount. % of Price (to the right of Loan Amount) updates when you enter the Purchase Price and either the Loan Amount or Downpayment. This holds the percentage of the Purchase Price that comprises the Loan Amount. Setting the percentage manually causes the Loan Amount and Downpayment to update. Downpayment holds any downpayment paid for the asset. % of Price (to the right of Downpayment) updates when you enter the Purchase Price and either the Loan Amount or Downpayment. This holds the percentage of the Purchase Price that comprises the Downpayment. Setting the percentage manually causes the Loan Amount and Downpayment to update.
3018 F Chapter 50: Investments
Initialization Costs and Discount Points area Loan Amount holds a copy of the Loan Amount above. Initialization Costs holds the value of any initialization costs. % of Amount (to the right of Initialization Costs) updates when you enter the Purchase Price and either the Initialization Costs or Discount Points. This holds the percentage of the Loan Amount that comprises the Initialization Costs. Setting the percentage manually causes the Initialization Costs to update. Discount Points holds the value of any discount points. % of Amount (to the right of Discount Points) updates when you enter the Purchase Price and either the Initialization Costs or Discount Points. This holds the percentage of the Loan Amount that comprises the Discount Points. Setting the percentage manually causes the Discount Points to update. OK returns you to the Loan dialog box, saving the information that is entered. Cancel returns you to the Loan dialog box, discarding any changes made since you opened the dialog box.
Loan Prepayments Clicking Prepayments in the Loan dialog box opens the Loan Prepayments dialog box displayed in Figure 50.24. Figure 50.24 Loan Prepayments Dialog Box
The following items are displayed: Uniform Prepayment holds the value of a regular prepayment concurrent to the usual periodic payment.
Rate Adjustment Terms F 3019
Prepayments holds a list of date-amount pairs to accommodate any prepayments. Right-clicking within the Prepayments area reveals many helpful tools for managing date-amount pairs. OK returns you to the Loan dialog box, storing the information entered on the prepayments. Cancel returns you to the Loan dialog box, discarding any prepayments entered since you opened the dialog box.
Balloon Payments Clicking Balloon Payments in the Loan dialog box opens the Balloon Payments dialog box displayed in Figure 50.25. Figure 50.25 Balloon Payments Dialog Box
The following items are displayed: Balloon Payments holds a list of date-amount pairs to accommodate any balloon payments. Rightclicking within the Balloon Payments area reveals many helpful tools for managing date-amount pairs. OK returns you to the Loan dialog box, storing the information entered on the balloon payments. Cancel returns you to the Loan dialog box, discarding any balloon payments entered since you opened the dialog box.
Rate Adjustment Terms Clicking Rate Adjustments in the Loan dialog box opens the Rate Adjustment Terms dialog box displayed in Figure 50.26.
3020 F Chapter 50: Investments
Figure 50.26 Rate Adjustment Terms Dialog Box
The following items are displayed: The Rate Adjustment Terms area Life Cap holds the maximum deviation from the Initial Rate allowed over the life of the loan. Periodic Cap holds the maximum adjustment allowed per adjustment. Adjustment Frequency holds how often (in months) the lender can adjust the interest rate. The Rate Adjustment Assumption determines the scenario the adjustments will take. Worst Case uses information from the Rate Adjustment Terms area to forecast a worst-case scenario. Best Case uses information from the Rate Adjustment Terms area to forecast a best-case scenario. Fixed Case specifies a fixed-rate loan. Estimated Case uses information from the Rate Adjustment Terms and Estimated Rate area to forecast a best-case scenario. Estimated Rates holds a list of date-rate pairs, where each date is a SAS date and the rate is a nominal percentage between 0 and 120. The Estimated Case assumption uses these rates for its calculations. Right-clicking within the Estimated Rates area reveals many helpful tools for managing date-rate pairs. OK returns you to the Loan dialog box, taking rate adjustment information into account.
Savings F 3021
Cancel returns you to the Loan dialog box, discarding any rate adjustment information provided since opening the dialog box.
Rounding Off Clicking Rounding Off in the Loan dialog box opens the Rounding Off dialog box displayed in Figure 50.27. Figure 50.27 Rounding Off Dialog Box
The following items are displayed: Decimal Places fixes the number of decimal places your results will display. OK returns you to the Loan dialog box. Numeric values will then be represented with the number of decimals specified in Decimal Places. Cancel returns you to the Loan dialog box. Numeric values will be represented with the number of decimals specified prior to opening this dialog box.
Savings Selecting Investment ! New ! Savings from the Investment Analysis dialog box’s menu bar opens the Savings dialog box displayed in Figure 50.28.
3022 F Chapter 50: Investments
Figure 50.28 Savings Dialog Box
The following items are displayed: Name holds the name you assign to the savings. You can set the name here or within the Portfolio area of the Investment Analysis dialog box. This must be a valid SAS name. The Savings Specification area Periodic Deposit holds the value of your regular deposits. Number of Deposits holds the number of deposits into the account. Initial Rate holds the interest rate (a nominal percentage between 0 and 120) the savings account earns. Start Date holds the SAS date when deposits begin. Deposit Interval holds the frequency of your Periodic Deposit. Compounding Interval holds how often the interest compounds. Create Account Summary becomes available when you adequately define the savings within the Savings Specification area. Clicking it generates the account summary. Account Summary fills when you click Create Account Summary. The schedule contains a row for each deposit-date with information about the following: Date is the SAS date of a deposit. Starting Balance is the balance at that date. Deposits is the deposit at that date. Interest Earned is the interest earned since the previous date.
Depreciation F 3023
Ending Balance is the balance after the payment. Print becomes available when you generate an account summary. Clicking it sends the contents of the account summary to the SAS session print device. Save Data As becomes available when you generate an account summary. Clicking it opens the Save Output Dataset dialog box where you can save the account summary (or portions thereof) as a SAS Dataset. OK returns you to the Investment Analysis dialog box. If this is a new savings, clicking OK appends the current savings specification to the portfolio. If this is an existing savings specification, clicking OK returns the altered savings to the portfolio. Cancel returns you to the Investment Analysis dialog box. If this is a new savings, clicking Cancel discards the current savings specification. If this is an existing savings, clicking Cancel discards the current changes.
Depreciation Selecting Investment ! New ! Depreciation from the Investment Analysis dialog box’s menu bar opens the Depreciation dialog box displayed in Figure 50.29. Figure 50.29 Depreciation Dialog Box
The following items are displayed: Name holds the name you assign to the depreciation. You can set the name here or within the Portfolio area of the Investment Analysis dialog box. This must be a valid SAS name.
3024 F Chapter 50: Investments
Depreciable Asset Specification Cost holds the asset’s original cost. Year of Purchase holds the asset’s year of purchase. Useful Life holds the asset’s useful life (in years). Salvage Value holds the asset’s value at the end of its Useful Life. The Depreciation Method area holds the depreciation methods available: Straight Line Sum-of-years Digits Depreciation Table Declining Balance – DB Factor: choice of 2, 1.5, or 1 – Conversion to SL: choice of Yes or No Create Depreciation Schedule becomes available when you adequately define the depreciation within the Depreciation Asset Specification area. Clicking the Create Depreciation Schedule button then fills the Depreciation Schedule area. Depreciation Schedule fills when you click Create Depreciation Schedule. The schedule contains a row for each year. Each row holds: Year is a year. Start Book Value is the starting book value for that year. Depreciation is the depreciation value for that year. End Book Value is the ending book value for that year. Print becomes available when you generate the depreciation schedule. Clicking it sends the contents of the depreciation schedule to the SAS session print device. Save Data As becomes available when you generate the depreciation schedule. Clicking it opens the Save Output Dataset dialog box where you can save the depreciation table (or portions thereof) as a SAS Dataset. OK returns you to the Investment Analysis dialog box. If this is a new depreciation specification, clicking OK appends the current depreciation specification to the portfolio. If this is an existing depreciation specification, clicking OK returns the altered depreciation specification to the portfolio. Cancel returns you to the Investment Analysis dialog box. If this is a new depreciation specification, clicking Cancel discards the current depreciation specification. If this is an existing depreciation specification, clicking Cancel discards the current changes.
Bond F 3025
Depreciation Table Clicking Depreciation Table from Depreciation Method area of the Depreciation dialog box opens the Depreciation Table dialog box displayed in Figure 50.30. Figure 50.30 Depreciation Table Dialog Box
The following items are displayed: The Depreciation area holds a list of year-rate pairs where the rate is an annual depreciation rate (a percentage between 0% and 100%). Right-clicking within the Depreciation area reveals many helpful tools for managing year-rate pairs. OK returns you to the Depreciation dialog box with the current list of depreciation rates from the Depreciation area. Cancel returns you to the Depreciation dialog box, discarding any editions to the Depreciation area since you opened the dialog box.
Bond Selecting Investment ! New ! Bond from the Investment Analysis dialog box’s menu bar opens the Bond dialog box displayed in Figure 50.31.
3026 F Chapter 50: Investments
Figure 50.31 Bond
The following items are displayed: Name holds the name you assign to the bond. You can set the name here or within the Portfolio area of the Investment Analysis dialog box. This must be a valid SAS name. Bond Specification Face Value holds the bond’s value at maturity. Coupon Payment holds the amount of money you receive periodically as the bond matures. Coupon Rate holds the rate (a nominal percentage between 0% and 120%) of the Face Value that defines the Coupon Payment. Coupon Interval holds how often the bond pays its coupons. Number of Coupons holds the number of coupons before maturity. Maturity Date holds the SAS date when you can redeem the bond for its Face Value. The Valuation area becomes available when you adequately define the bond within the Bond Specification area. Entering either the Value or the Yield causes the calculation of the other. If you respecify the bond after performing a calculation here, you must reenter the Value or Yield value to update the calculation. Value holds the bond’s value if expecting the specified Yield. Yield holds the bond’s yield if the bond is valued at the amount of Value. You must specify the bond before analyzing it. After you have specified the bond, clicking Analyze opens the Bond Analysis dialog box where you can compare various values and yields.
Bond Analysis F 3027
OK returns you to the Investment Analysis dialog box. If this is a new bond specification, clicking OK appends the current bond specification to the portfolio. If this is an existing bond specification, clicking OK returns the altered bond specification to the portfolio. Cancel returns you to the Investment Analysis dialog box. If this is a new bond specification, clicking Cancel discards the current bond specification. If this is an existing bond specification, clicking Cancel discards the current changes.
Bond Analysis Clicking Analyze from the Bond dialog box opens the Bond Analysis dialog box displayed in Figure 50.32. Figure 50.32 Bond Analysis
The following items are displayed: Analysis Specifications Yield-to-maturity holds the percentage yield upon which to center the analysis. +/- holds the maximum deviation percentage to consider from the Yield-to-maturity. Increment by holds the percentage increment by which the analysis is calculated. Reference Price holds the reference price. Analysis Dates holds a list of SAS dates for which you perform the bond analysis.
3028 F Chapter 50: Investments
You must specify the analysis before valuing the bond for the various yields. After you adequately specify the analysis, click Create Bond Valuation Summary to generate the bond valuation summary. Bond Valuation Summary fills when you click Create Bond Valuation Summary. The schedule contains a row for each rate with information concerning the following: Date is the SAS date when the Value gives the particular Yield. Yield is the percent yield that corresponds to the Value at the given Date. Value is the value of the bond at Date for the given Yield. Percent Change is the percent change if the Reference Price is specified. Duration is the duration. Convexity is the convexity. Graphics opens the Bond Price graph that represents the price versus yield-to-maturity. Print becomes available when you generate the Bond Valuation Summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you fill the Bond Valuation Summary area. Clicking it opens the Save Output Dataset dialog box where you can save the valuation summary (or portions thereof) as a SAS Dataset. Return takes you back to the Bond dialog box.
Bond Price Clicking Graphics from the Bond dialog box opens the Bond Price dialog box displayed in Figure 50.33.
Generic Cashflow F 3029
Figure 50.33 Bond Price Graph
It possesses the following item: Return takes you back to the Bond Analysis dialog box.
Generic Cashflow Selecting Investment ! New ! Generic Cashflow from the Investment Analysis dialog box’s menu bar opens the Generic Cashflow dialog box displayed in Figure 50.34. Figure 50.34 Generic Cashflow
3030 F Chapter 50: Investments
The following items are displayed: Name holds the name you assign to the generic cashflow. You can set the name here or within the Portfolio area of the Investment Analysis dialog box. This must be a valid SAS name. Cashflow Specification holds date-amount pairs that correspond to deposits and withdrawals (or benefits and costs) for the cashflow. Each date is a SAS date. Right-clicking within the Cashflow Specification area reveals many helpful tools for managing date-amount pairs. The Cashflow Chart fills with a graph representing the cashflow when the Cashflow Specification area is nonempty. The box to the right of the scroll bar controls the number of entries with which to fill the graph. If the number in this box is less than the total number of entries, you can use the scroll bar to view different segments of the cashflow. The left box below the scroll bar holds the frequency for drilling purposes. OK returns you to the Investment Analysis dialog box. If this is a new generic cashflow specification, clicking OK appends the current cashflow specification to the portfolio. If this is an existing cashflow specification, clicking OK returns the altered cashflow specification to the portfolio. Cancel returns you to the Investment Analysis dialog box. If this is a new cashflow specification, clicking Cancel discards the current cashflow specification. If this is an existing cashflow specification, clicking Cancel discards the current changes.
Right-Clicking within Generic Cashflow’s Cashflow Specification Area Right-click within the Cashflow Specification area of the Generic Cashflow dialog box pops up the menu displayed in Figure 50.35. Figure 50.35 Right-Clicking
Add creates a blank pair. Delete removes the currently highlighted pair. Copy duplicates the currently selected pair. Sort arranges the entered pairs in chronological order. Clear empties the area of all pairs.
Flow Specification F 3031
Save opens the Save Dataset dialog box where you can save the entered pairs as a SAS Dataset for later use. Load opens the Load Dataset dialog box where you select a SAS Dataset to populate the area. Specify opens the Flow Specification dialog box where you can generate date-rate pairs to include in your cashflow. Forecast opens the Forecast Specification dialog box where you can generate the forecast of a SAS data set to include in your cashflow. If you want to perform one of these actions on a collection of pairs, you must select a collection of pairs before right-clicking. To select an adjacent list of pairs, do the following: click the first pair, hold down the SHIFT key, and click the final pair. After the list of pairs is selected, you can release the SHIFT key.
Flow Specification Figure 50.36 Flow Specification
The following items are displayed: Flow Time Specification Time Interval holds the uniform frequency of the entries. You can set the Starting Date when you set the Time Interval. It holds the SAS date the entries will start.
3032 F Chapter 50: Investments
You can set the Ending Date when you set the Time Interval. It holds the SAS date the entries will end. Number of Periods holds the number of entries. Flow Value Specification Series Flow Type describes the movement the entries can assume:
Uniform assumes all entries are equal.
Arithmetic assumes the entries begin at Level and increase by the value of Gradient per entry.
Geometric assumes the entries begin at Level and increase by a factor of Gradient per entry.
Level holds the starting amount for all flow types. You can set the Gradient when you select either Arithmetic or Geometric series flow type. It holds the arithmetic and geometric gradients, respectively, for the Arithmetic and Geometric flow types. When the cashflow entries are adequately defined, the Cashflow Chart fills with a graph displaying the dates and values of the entries. The box to the right of the scroll bar controls the number of entries with which to fill the graph. If the number in this box is less than the total number of entries, you can use the scroll bar to view different segments of the cashflow. The left box below the scroll bar holds the frequency. Subtract becomes available when the collection of entries is adequately specified. Clicking Subtract then returns you to the Generic Cashflow dialog box and subtracts the entries from the current cashflow. Add becomes available when the collection of entries is adequately specified. Clicking Add then returns you to the Generic Cashflow dialog box and adds the entries to the current cashflow. Cancel returns you to Generic Cashflow dialog box without changing the cashflow.
Forecast Specification F 3033
Forecast Specification Figure 50.37 Forecast Specification
The following items are displayed: Historical Data Specification Data Set holds the name of the SAS data set to forecast. Browse opens the standard SAS Open dialog box to help select a SAS data set to forecast. Time ID Variable holds the time ID variable to forecast over. Time Interval fixes the time interval for the Time ID Variable. Analysis Variable holds the data variable upon which to forecast. View Table opens a table that displays the contents of the specified SAS data set. View Graph opens the Time Series Viewer that graphically displays the contents of the specified SAS data set. Forecast Specification Horizon holds the number of periods into the future you want to forecast. Confidence holds the confidence limit for applicable forecasts. Compute Forecast fills the Cashflow Chart with the forecast. The box below Forecast Specification holds the type of forecast you want to generate:
3034 F Chapter 50: Investments
Predicted Value Lower Confidence Limit Upper Confidence Limit The Cashflow Chart fills when you click Compute Forecast. The box to the right of the scroll bar controls the number of entries with which to fill the graph. If the number in this box is less than the total number of entries, you can use the scroll bar to view different segments of the cashflow. The left box below the scroll bar holds the frequency. Subtract becomes available when the collection of entries is adequately specified. Clicking Subtract then returns you to the Generic Cashflow dialog box subtracting the forecast from the current cashflow. Add becomes available when the collection of entries is adequately specified. Clicking Add then returns you to the Generic Cashflow adding the forecast to the current cashflow. Cancel returns to Generic Cashflow dialog box without changing the cashflow.
Chapter 51
Computations Contents The Compute Menu . . . . . . . . . . Tasks . . . . . . . . . . . . . . . . . Taxing a Cashflow . . . . . . . Converting Currency . . . . . . Deflating Cashflows . . . . . . Dialog Box Guide . . . . . . . . . . . After Tax Cashflow Calculation Currency Conversion . . . . . . Constant Dollar Calculation . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
3035 3036 3036 3038 3040 3042 3042 3043 3045
The Compute Menu Figure 51.1 shows the Compute menu. Figure 51.1 The Compute Menu
The Compute menu offers the following options that apply to generic cashflows. After Tax Cashflow opens the After Tax Cashflow Calculation dialog box. Computing an after tax cashflow is useful when taxes affect investment alternatives differently. Comparing after tax cashflows provides a more accurate determination of the cashflows’ profitabilities. You can set default values for income tax rates by selecting Tools ! Define Rate ! Income Tax Rate from the Investment Analysis dialog box. This opens the Income Tax Specification dialog box where you can enter the tax rates.
3036 F Chapter 51: Computations
Currency Conversion opens the Currency Conversion dialog box. Currency conversion is necessary when investments are in different currencies. For data concerning currency conversion rates, see http://dsbb.imf.org/, the International Monetary Fund’s Dissemination Standards Bulletin Board. Constant Dollars opens the Constant Dollar Calculation dialog box. A constant dollar (inflation adjusted monetary value) calculation takes cashflow and inflation information and discounts the cashflow to a level where the buying power of the monetary unit is constant over time. Groups quantify inflation (in the form of price indices and inflation rates) for countries and industries by averaging the growth of prices for various products and sectors of the economy. For data concerning price indices, see the United States Department of Labor at http://www.dol.gov/ and the International Monetary Fund’s Dissemination Standards Bulletin Board at http://dsbb.imf. org/. You can set default values for inflation rates by clicking Tools ! Define Rate ! Inflation from the Investment Analysis dialog box. This opens the Inflation Specification dialog box where you can enter the inflation rates.
Tasks The next few sections show how to perform computations for the following situation. Suppose you buy a $10,000 certificate of deposit that pays 12% interest a year for five years. Your earnings are taxed at a rate of 30% federally and 7% locally. Also, you want to transfer all the money to an account in England. British pounds convert to American dollars at an exchange rate of $1.00 to £0.60. The inflation rate in England is 3%. The instructions in this example assume familiarity with the following: The right-clicking options of the Cashflow Specification area in the Generic Cashflow dialog box (described in the section “Right-Clicking within Generic Cashflow’s Cashflow Specification Area” on page 3030.) The Save Data As button located in many dialog boxes (described in the section “Saving Output to SAS Data Sets” on page 3063.)
Taxing a Cashflow Consider the example described in the section “The Compute Menu” on page 3035. To create the earnings, follow these steps: 1. Select Investment ! New ! Generic Cashflow to create a generic cashflow. 2. Enter CD_INTEREST for the Name. 3. Enter 1200 for each of the five years starting one year from today as displayed in Figure 51.2.
Taxing a Cashflow F 3037
4. Click OK to return to the Investment Analysis dialog box. Figure 51.2 Computing the Interest on the CD
To compute the tax on the earnings, follow these steps: 1. Select CD_INTEREST from the Portfolio area. 2. Select Compute ! After Tax Cashflow from the pull-down menu. 3. Enter 30 for Federal Tax. 4. Enter 7 for Local Tax. Note that Combined Tax updates. 5. Click Create After Tax Cashflow. The After Tax Cashflow area fills, as displayed in Figure 51.3.
3038 F Chapter 51: Computations
Figure 51.3 Computing the Interest After Taxes
Save the taxed earnings to a SAS data set named WORK.CD_AFTERTAX. Click Return to return to the Investment Analysis dialog box.
Converting Currency Consider the example described in the section “The Compute Menu” on page 3035. To create the cashflow to convert, follow these steps: 1. Select Investment ! New ! Generic Cashflow to open a new generic cashflow. 2. Enter CD_DOLLARS for the Name. 3. Load WORK.CD_AFTERTAX into its Cashflow Specification. 4. Add –10,000 for today and +10,000 for five years from today to the cashflow as displayed in Figure 51.4. 5. Sort the transactions by date to aid your reading. 6. Click OK to return to the Investment Analysis dialog box.
Converting Currency F 3039
Figure 51.4 The CD in Dollars
To convert from British pounds to American dollars, follow these steps: 1. Select CD_DOLLARS from the portfolio. 2. Select Compute ! Currency Conversion from the pull-down menu. Currency Conversion dialog box.
This opens the
3. Select USD for the From Currency. 4. Select GBP for the To Currency. 5. Enter 0.60 for the Exchange Rate. 6. Click Apply Currency Conversion to fill the Currency Conversion area as displayed in Figure 51.5.
3040 F Chapter 51: Computations
Figure 51.5 Converting the CD to Pounds
Save the converted values to a SAS data set named WORK.CD_POUNDS. Click Return to return to the Investment Analysis dialog box.
Deflating Cashflows Consider the example described in the section “The Compute Menu” on page 3035. To create the cashflow to deflate, follow these steps: 1. Select Investment ! New ! Generic Cashflow to open a new generic cashflow. 2. Enter CD_DEFLATED for Name. 3. Load WORK.CD_POUNDS into its Cashflow Specification (see Figure 51.6). 4. Click OK to return to the Investment Analysis dialog box.
Deflating Cashflows F 3041
Figure 51.6 The CD before Deflation
To deflate the values, follow these steps: 1. Select CD_DEFLATED from the portfolio. 2. Select Compute ! Constant Dollars from the menu. This opens the Constant Dollar Calculation dialog box. 3. Clear the Variable Inflation List area. 4. Enter 3 for the Constant Inflation Rate. 5. Click Create Constant Dollar Equivalent to generate a constant dollar equivalent summary (see Figure 51.7).
3042 F Chapter 51: Computations
Figure 51.7 CD Values after Deflation
You can save the deflated cashflow to a SAS data set for use in an internal rate of return analysis or breakeven analysis. Click Return to return to the Investment Analysis dialog box.
Dialog Box Guide
After Tax Cashflow Calculation Having selected a generic cashflow from the Investment Analysis dialog box, to perform an after tax calculation, select Compute ! After Tax from the Investment Analysis dialog box’s menu bar. This opens the After Tax Cashflow Calculation dialog box displayed in Figure 51.8.
Currency Conversion F 3043
Figure 51.8 After Tax Cashflow Calculation Dialog Box
The following items are displayed: Name holds the name of the investment for which you are computing the after-tax cashflow. Federal Tax holds the federal tax rate (a percentage between 0% and 100%). Local Tax holds the local tax rate (a percentage between 0% and 100%). Combined Tax holds the effective tax rate from federal and local income taxes. Create After Tax Cashflow becomes available when Combined Tax is not empty. Clicking Create After Tax Cashflow then fills the After Tax Cashflow area. After Tax Cashflow fills when you click Create After Tax Cashflow. It holds a list of date-amount pairs where the amount is the amount retained after taxes for that date. Print becomes available when you fill the after-tax cashflow. Clicking it sends the contents of the after tax cashflow to the SAS session print device. Save Data As becomes available when you fill the after tax cashflow. Clicking it opens the Save Output Dataset dialog box where you can save the resulting cashflow (or portions thereof) as a SAS Dataset. Return returns you to the Investment Analysis dialog box.
Currency Conversion Having selected a generic cashflow from the Investment Analysis dialog box, to perform a currency conversion, select Compute ! Currency Conversion from the Investment Analysis dialog box’s menu bar. This opens the Currency Conversion dialog box displayed in Figure 51.9.
3044 F Chapter 51: Computations
Figure 51.9 Currency Conversion Dialog Box
The following items are displayed: Name holds the name of the investment to which you are applying the currency conversion. From Currency holds the name of the currency the cashflow currently represents. To Currency holds the name of the currency to which you wish to convert. Exchange Rate holds the rate of exchange between the From Currency and the To Currency. Apply Currency Conversion becomes available when you fill Exchange Rate. Apply Currency Conversion fills the Currency Conversion area.
Clicking
Currency Conversion fills when you click Apply Currency Conversion. The schedule contains a row for each cashflow item with the following information: Date is a SAS date within the cashflow. The From Currency value is the amount in the original currency at that date. The To Currency value is the amount in the new currency at that date. Print becomes available when you fill the Currency Conversion area. Clicking it sends the contents of the conversion table to the SAS session print device. Save Data As becomes available when you fill the Currency Conversion area. Clicking it opens the Save Output Dataset dialog box where you can save the conversion table (or portions thereof) as a SAS Dataset. Return returns you to the Investment Analysis dialog box.
Constant Dollar Calculation F 3045
Constant Dollar Calculation Having selected a generic cashflow from the Investment Analysis dialog box, to perform a constant dollar calculation, select Compute ! Constant Dollars from the Investment Analysis dialog box’s menu bar. This opens the Constant Dollar Calculation dialog box displayed in Figure 51.10. Figure 51.10 Constant Dollar Calculation Dialog Box
The following items are displayed: Name holds the name of the investment for which you are computing the constant dollars value. Constant Inflation Rate holds the constant inflation rate (a percentage between 0% and 120%). This value is used if the Variable Inflation List area is empty. Variable Inflation List holds date-rate pairs that describe how inflation varies over time. Each date is a SAS date, and the rate is a percentage between 0% and 120%. Each date refers to when that inflation rate begins. Right-clicking within the Variable Inflation area reveals many helpful tools for managing date-rate pairs. If you assume a fixed inflation rate, just insert that rate in Constant Rate. Dates holds the SAS date(s) at which you wish to compute the constant dollar equivalent. Rightclicking within the Dates area reveals many helpful tools for managing date lists. Create Constant Dollar Equivalent becomes available when you enter inflation rate information. Clicking it fills the constant dollar equivalent summary with the computed constant dollar values. Constant Dollar Equivalent Summary fills with a summary when you click Create Constant Dollar Equivalent. The first column lists the dates of the generic cashflow. The second column contains the constant dollar equivalent of the original generic cashflow item of that date.
3046 F Chapter 51: Computations
Print becomes available when you fill the constant dollar equivalent summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you fill the constant dollar equivalent summary. Clicking it opens the Save Output Dataset dialog box where you can save the summary (or portions thereof) as a SAS Dataset. Return returns you to the Investment Analysis dialog box.
Chapter 52
Analyses Contents The Analyze Menu . . . . . . . . . . . . . . . . Tasks . . . . . . . . . . . . . . . . . . . . . . . Performing Time Value Analysis . . . . . Computing an Internal Rate of Return . . . Performing a Benefit-Cost Ratio Analysis . Computing a Uniform Periodic Equivalent Performing a Breakeven Analysis . . . . . Dialog Box Guide . . . . . . . . . . . . . . . . . Time Value Analysis . . . . . . . . . . . . Uniform Periodic Equivalent . . . . . . . . Internal Rate of Return . . . . . . . . . . . Benefit-Cost Ratio Analysis . . . . . . . . Breakeven Analysis . . . . . . . . . . . . Breakeven Graph . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
The Analyze Menu Figure 52.1 shows the Analyze menu. Figure 52.1 Analyze Menu
The Analyze menu offers the following options for use on applicable investments.
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. 3047 3048 3048 3050 . 3051 3052 3053 3055 3055 . 3057 3058 3058 3060 . 3061
3048 F Chapter 52: Analyses
Time Value opens the Time Value Analysis dialog box. Time value analysis involves moving money through time across a defined minimum attractive rate of return (MARR) so that you can compare value at a consistent date. The MARR can be constant or variable over time. Periodic Equivalent opens the Uniform Periodic Equivalent dialog box. Uniform periodic equivalent analysis determines the payment needed to convert a cashflow to uniform amounts over time, given a periodicity, a number of periods, and a MARR. This option helps when making comparisons where one alternative is uniform (such as renting) and another is not (such as buying). Internal Rate of Return opens the Internal Rate of Return dialog box. The internal rate of return of a cashflow is the interest rate that makes the time value equal to 0. This calculation assumes uniform periodicity of the cashflow. It is particularly applicable where the choice of MARR would be difficult. Benefit-Cost Ratio opens the Benefit-Cost Ratio Analysis dialog box. The benefit-cost ratio divides the time value of the benefits by the time value of the costs. For example, governments often use this analysis when deciding whether to commit to a public works project. Breakeven Analysis opens the Breakeven Analysis dialog box. Breakeven analysis computes time values at various MARRs to compare, which can be advantageous when it is difficult to determine a MARR. This analysis can help you determine how the cashflow’s profitability varies with your choice of MARR. A graph displaying the relationships between time value and MARR is also available.
Tasks
Performing Time Value Analysis Suppose a rock quarry needs equipment to use the next five years. It has two alternatives: a box loader and conveyer system that has a one-time cost of $264,000 a two-shovel loader, which costs $84,000 but has a yearly operating cost of $36,000. This loader has a service life of three years, which necessitates the purchase of a new loader for the final two years of the rock quarry project. Assume the second loader also costs $84,000 and its salvage value after its two-year service is $10,000. A SAS data set that describes this is available at SASHELP.ROCKPIT You expect a 13% MARR. Which is the better alternative? To create the cashflows, follow these steps: 1. Create a cashflow with the single amount –264,000. Date the amount 01JAN1998 to be consistent with the SAS data set you load. 2. Load SASHELP.ROCKPIT into a second cashflow, as displayed in Figure 52.2.
Performing Time Value Analysis F 3049
Figure 52.2 The contents of SASHELP.ROCKPIT
To compute the time values of these investments, follow these steps: 1. Select both cashflows. 2. Select Analyze ! Time Value. This opens the Time Value Analysis dialog box. 3. Enter the date 01JAN1998 into the Dates area. 4. Enter 13 for the Constant MARR. 5. Click Create Time Value Summary.
3050 F Chapter 52: Analyses
Figure 52.3 Performing the Time Value Analysis
As shown in Figure 52.3, option 1 has a time value of –$264,000.00 naturally on 01JAN1998. However, option 2 has a time value of –$263,408.94, which is slightly less expensive.
Computing an Internal Rate of Return You are choosing between five investments. A portfolio containing these investments is available at SASHELP.INVSAMP.NVST. Which investments are acceptable if you expect a MARR of 9%? Open the portfolio SASHELP.INVSAMP.NVST and compare the investments. Note that Internal Rate of Return computations assume regular periodicity of the cashflow. To compute the internal rates of return, follow these steps: 1. Select all five investments. 2. Select Analyze ! Internal Rate of Return.
Performing a Benefit-Cost Ratio Analysis F 3051
Figure 52.4 Computing an Internal Rate of Return
The results displayed in Figure 52.4 indicate that the internal rates of return for investments 2, 4, and 5 are greater than 9%. Hence, each of these is acceptable.
Performing a Benefit-Cost Ratio Analysis Suppose a municipality has excess funds to invest. It is choosing between the same investments described in the previous example. Government agencies often compute benefit-cost ratios to decide which investment to pursue. Which is best in this case? Open the portfolio SASHELP.INVSAMP.NVST and compare the investments. To compute the benefit-cost ratios, follow these steps: 1. Select all five investments. 2. Select Analyze ! Benefit-Cost Ratio. 3. Enter 01JAN1996 for the Date. 4. Enter 9 for Constant MARR. 5. Click Create Benefit-Cost Ratio Summary to fill the Benefit-Cost Ratio Summary area. The results displayed in Figure 52.5 indicate that investments 2, 4, and 5 have ratios greater than 1. Therefore, each is profitable with a MARR of 9%.
3052 F Chapter 52: Analyses
Figure 52.5 Performing a Benefit-Cost Ratio Analysis
Computing a Uniform Periodic Equivalent Suppose you need a warehouse for ten years. You have two options: pay rent for ten years at $23,000 per year build a two-stage facility that you will maintain and which you intend to sell at the end of those ten years Data sets describing these scenarios are available in the portfolio SASHELP.INVSAMP.BUYRENT. Which option is more financially sound if you desire a 12% MARR? Open the portfolio SASHELP.INVSAMP.BUYRENT and compare the options. To perform the periodic equivalent, follow these steps: 1. Load the portfolio SASHELP.INVSAMP.BUYRENT. 2. Select both cashflows. 3. Select Analyze ! Periodic Equivalent. This opens the Uniform Periodic Equivalent dialog box. 4. Enter 01JAN1996 for the Start Date. 5. Enter 10 for the Number of Periods.
Performing a Breakeven Analysis F 3053
6. Select YEAR for the Interval. 7. Enter 12 for the Constant MARR. 8. Click Create Time Value Summary. Figure 52.6 Computing a Uniform Periodic Equivalent
Figure 52.6 indicates that renting costs about $1,300 less each year. Hence, renting is more financially sound. Notice the periodic equivalent for renting is not $23,000. This is because the $23,000 per year does not account for the MARR.
Performing a Breakeven Analysis In the previous example you computed the uniform periodic equivalent for a rent-buy scenario. Now let’s perform a breakeven analysis to see how the MARR affects the time values. To perform the breakeven analysis, follow these steps: 1. Select both options. 2. Select Analyze ! Breakeven Analysis. 3. Enter 01JAN1996 for the Date. 4. Enter 12.0 for Value. 5. Enter 4.0 for (+/-).
3054 F Chapter 52: Analyses
6. Enter 0.5 for Increment by. 7. Click Create Breakeven Analysis Summary to fill the Breakeven Analysis Summary area as displayed in Figure 52.7. Figure 52.7 Performing a Breakeven Analysis
Click Graphics to view a plot displaying the relationship between time value and MARR.
Dialog Box Guide F 3055
Figure 52.8 Viewing a Breakeven Graph
As shown in Figure 52.8, renting is better if you want a MARR of 12%. However, if your MARR should drop to 10.5%, buying would be better. With a single investment, knowing where the graph has a time value of 0 tells the MARR when a venture switches from being profitable to being a loss. With multiple investments, knowing where the graphs for the various investments cross each other tells at what MARR a particular investment becomes more profitable than another.
Dialog Box Guide
Time Value Analysis Having selected a generic cashflow from the Investment Analysis dialog box, to perform an time value analysis, select Analyze ! Time Value from the Investment Analysis dialog box’s menu bar. This opens the Time Value Analysis dialog box displayed in Figure 52.9.
3056 F Chapter 52: Analyses
Figure 52.9 Time Value Analysis Dialog Box
The following items are displayed: Analysis Specifications Dates holds the list of dates as of which to perform the time value analysis. Right-clicking within the Dates area reveals many helpful tools for managing date lists. Constant MARR holds the desired MARR for the time value analysis. This value is used if the MARR List area is empty. MARR List holds date-rate pairs that express your desired MARR as it changes over time. Each date refers to when that expected MARR begins. Right-clicking within the MARR List area reveals many helpful tools for managing date-rate pairs. Create Time Value Summary becomes available when you adequately specify the analysis within the Analysis Specifications area. Clicking Create Time Value Summary then fills the Time Value Summary area. Time Value Summary fills when you click Create Time Value Summary. The table contains a row for each date in the Dates area. The remainder of each row holds the time values at that date, one value for each investment selected. Print becomes available when you fill the time value summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you fill the time value summary. Clicking it opens the Save Output Dataset dialog box where you can save the summary (or portions thereof) as a SAS Dataset. Return takes you back to the Investment Analysis dialog box.
Uniform Periodic Equivalent F 3057
Uniform Periodic Equivalent Having selected a generic cashflow from the Investment Analysis dialog box, to perform a uniform periodic equivalent, select Analyze ! Periodic Equivalent from the Investment Analysis dialog box’s menu bar. This opens the Uniform Periodic Equivalent dialog box displayed in Figure 52.10. Figure 52.10 Uniform Periodic Equivalent Dialog Box
The following items are displayed: Analysis Specifications Start Date holds the date the uniform periodic equivalents begin. Number of Periods holds the number of uniform periodic equivalents. Interval holds how often the uniform periodic equivalents occur. Constant MARR holds the Minimum Attractive Rate of Return. Create Periodic Equivalent Summary becomes available when you adequately fill the Analysis Specification area. Clicking Create Periodic Equivalent Summary then fills the periodic equivalent summary. Periodic Equivalent Summary fills with two columns when you click Create Periodic Equivalent Summary. The first column lists the investments selected. The second column lists the computed periodic equivalent amount. Print becomes available when you fill the periodic equivalent summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you generate the periodic equivalent summary. Clicking it opens the Save Output Dataset dialog box where you can save the summary (or portions thereof) as a SAS Dataset.
3058 F Chapter 52: Analyses
Return takes you back to the Investment Analysis dialog box.
Internal Rate of Return Having selected a generic cashflow from the Investment Analysis dialog box, to perform an internal rate of return calculation, select Analyze ! Internal Rate of Return from the Investment Analysis dialog box’s menu bar. This opens the Internal Rate of Return dialog box displayed in Figure 52.11. Figure 52.11 Internal Rate of Return Dialog Box
The following items are displayed: IRR Summary contains a row for each deposit. Each row holds: Name holds the name of the investment. IRR holds the internal rate of return for that investment. interval holds the interest rate interval for that IRR. Print becomes available when you fill the IRR summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As opens the Save Output Dataset dialog box where you can save the IRR summary (or portions thereof) as a SAS data set. Return takes you back to the Investment Analysis dialog box.
Benefit-Cost Ratio Analysis Having selected a generic cashflow from the Investment Analysis dialog box, to compute a benefitcost ratio, select Analyze ! Benefit-Cost Ratio from the Investment Analysis dialog box’s menu bar. This opens the Benefit-Cost Ratio Analysis dialog box displayed in Figure 52.12.
Benefit-Cost Ratio Analysis F 3059
Figure 52.12 Benefit-Cost Ratio Analysis Dialog Box
The following items are displayed: Analysis Specifications Dates holds the dates as of which to compute the Benefit-Cost ratios. Constant MARR holds the desired MARR. This value is used if the MARR List area is empty. MARR List holds date-rate pairs that express your desired MARR as it changes over time. Each date refers to when that expected MARR begins. Right-clicking within the MARR List area reveals many helpful tools for managing date-rate pairs. Create Benefit-Cost Ratio Summary becomes available when you adequately specify the analysis. Clicking Create Benefit-Cost Ratio Summary fills the benefit-cost ratio summary. Benefit-Cost Ratio Summary fills when you click Exchange the Rates. The area contains a row for each date in the Dates area. The remainder of each row holds the benefit-cost ratios at that date, one value for each investment selected. Print becomes available when you fill the benefit-cost ratio summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you generate the benefit-cost ratio summary. Clicking it opens the Save Output Dataset dialog box where you can save the summary (or portions thereof) as a SAS Dataset. Return takes you back to the Investment Analysis dialog box.
3060 F Chapter 52: Analyses
Breakeven Analysis Having selected a generic cashflow from the Investment Analysis dialog box, to perform a breakeven analysis, select Analyze ! Breakeven Analysis from the Investment Analysis dialog box’s menu bar. This opens the Breakeven Analysis dialog box displayed in Figure 52.13. Figure 52.13 Breakeven Analysis Dialog Box
The following items are displayed: Analysis Specification Analysis holds the analysis type. Only Time Value is currently available. Date holds the date for which you perform this analysis. Variable holds the variable upon which the breakeven analysis will vary. Only MARR is currently available. Value holds the desired rate upon which to center the analysis. +/- holds the maximum deviation from the Value to consider. Increment by holds the increment by which the analysis is calculated. Create Breakeven Analysis Summary becomes available when you adequately specify the analysis. Clicking Create Breakeven Analysis Summary then fills the Breakeven Analysis Summary area. Breakeven Analysis Summary fills when you click Create Breakeven Analysis Summary. The schedule contains a row for each MARR and date. Graphics becomes available when you fill the Breakeven Analysis Summary area. Clicking it opens the Breakeven Graph graph representing the time value versus MARR.
Breakeven Graph F 3061
Print becomes available when you fill the breakeven analysis summary. Clicking it sends the contents of the summary to the SAS session print device. Save Data As becomes available when you generate the breakeven analysis summary. Clicking it opens the Save Output Dataset dialog box where you can save the summary (or portions thereof) as a SAS Dataset. Return takes you back to the Investment Analysis dialog box.
Breakeven Graph Suppose you perform a breakeven analysis in the Breakeven Analysis dialog box. Once you create the breakeven analysis summary, you can click the Graphics button to open the Breakeven Graph dialog box displayed in Figure 52.14. Figure 52.14 Breakeven Graph Dialog Box
The following item is displayed: Return takes you back to the Breakeven Analysis dialog box.
3062
Chapter 53
Details Contents Investments and Data Sets . . . . . . . . . . . . Saving Output to SAS Data Sets . . . . . . Loading a SAS Data Set into a List . . . . Saving Data from a List to a SAS Data Set Right Mouse Button Options . . . . . . . . . . . Depreciation Methods . . . . . . . . . . . . . . Straight Line (SL) . . . . . . . . . . . . . Sum-of-Years Digits . . . . . . . . . . . . Declining Balance (DB) . . . . . . . . . . Rate Information . . . . . . . . . . . . . . . . . The Tools Menu . . . . . . . . . . . . . . Dialog Box Guide . . . . . . . . . . . . . Minimum Attractive Rate of Return (MARR) . . Income Tax Specification . . . . . . . . . . . . . Inflation Specification . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . .
. .
3063 3063 3065 3065 3066 3067 3067 3067 3069 3070 3070 3071 3071 3072 3073 3074
Investments and Data Sets Investment Analysis provides tools to assist you in moving data between SAS data sets and lists you can use within Investment Analysis.
Saving Output to SAS Data Sets Many investment specifications have a button that reads Save Data As. Clicking that button opens the Save Output Dataset dialog box (see Figure 53.1). This dialog box enables you to save all or part of the area generated by the specification.
3064 F Chapter 53: Details
Figure 53.1 Saving to a Dataset
The following items are displayed: Dataset Name holds the SAS data set name to which you want to save. Browse opens the standard SAS Open dialog box, which enables you to select an existing SAS data set to overwrite. Dataset Label holds the SAS data set’s label. Dataset Variables organizes variables. The variables listed in the Selected area will be included in the SAS data set. You can select variables one at a time, by clicking the single right-arrow after each selection to move it to the Selected area. If the desired SAS data set has many variables you want to save, it may be simpler to follow these steps: 1. Click the double right-arrow to select all available variables. 2. Remove any unwanted variable by selecting it from the Selected area and clicking the single left-arrow. The double left-arrow removes all selected variables from the proposed SAS data set. The up and down arrows below the Available and Selected boxes enable you to scroll up and down the list of variables in their respective boxes. Save Dataset attempts to save the SAS data set. If the SAS data set name exists, you are asked if you want to replace the existing SAS data set, append to the existing SAS data set, or cancel the current save attempt. You then return to this dialog box ready to create another SAS data set to save.
Saving Data from a List to a SAS Data Set F 3065
Return takes you back to the specification dialog box.
Loading a SAS Data Set into a List Right-click in the area that you want to load the list and release on Load. This opens the Load Dataset dialog box (see Figure 53.2). Figure 53.2 Load Dataset Dialog Box
The following items are displayed: Dataset Name holds the name of the SAS data set that you want to load. Browse opens the standard SAS Open dialog box, which aids in finding a SAS data set to load. If there is a Date variable in the SAS data set, Investment Analysis loads it into the list. If there is no Date variable, it loads the first available time-formatted variable. If an amount or rate variable is needed, Investment Analysis searches the SAS data set for a Amount or Rate variable to use. Otherwise it takes the first numeric variable that is not used by the Date variable. Dataset Label holds a SAS data set label. OK attempts to load the SAS data set specified in Dataset Name. If the specified SAS data set exists, clicking OK returns you to the calling dialog box with the selected SAS data set filling the list. If the specified SAS data set does not exist and you click OK, you receive an error message and no SAS data set is loaded. Cancel returns you to the calling dialog box without loading a SAS data set.
Saving Data from a List to a SAS Data Set Right-click in the area you want to hold the list, and release on Save. This opens the Save Dataset dialog box.
3066 F Chapter 53: Details
Figure 53.3 Save Dataset Dialog Box
The following items are displayed: Dataset Name holds the SAS data set name to which you want to save. Browse opens the standard SAS Save As dialog box, which enables you to find an existing SAS data set to overwrite. Dataset Label holds a user-defined description to be saved as the label of the SAS data set. OK saves the current data to the SAS data set specified in Dataset Name. If the specified SAS data set does not already exist, clicking OK saves the SAS data set and returns you to the calling dialog box. If the specified SAS data set does already exist, clicking OK warns you and enables you to replace the old SAS data set with the new SAS data set or cancel the save attempt. Cancel aborts the save process. Clicking Cancel returns you to the calling dialog box without attempting to save.
Right Mouse Button Options A pop-up menu often appears when you right-click within table editors. The menus offer tools to aid in the management of the table’s entries. Most table editors provide the following options. Figure 53.4 Right-Clicking Options
Add creates a blank row.
Depreciation Methods F 3067
Delete removes any currently selected row. Copy duplicates the currently selected row. Sort arranges the rows in chronological order according to the date variable. Clear empties the table of all rows. Save opens the Save Dataset dialog box where you can save the all rows to a SAS Dataset for later use. Load opens the Load Dataset dialog box where you select a SAS Dataset to fill the rows. If you want to perform one of these actions on a collection of rows, you must select a collection of rows before right-clicking. To select an adjacent list of rows, do the following: click the first pair, hold down SHIFT, and click the final pair. After the list of rows is selected, you may release the SHIFT key.
Depreciation Methods Suppose an asset’s price is $20,000 and it has a salvage value of $5,000 in five years. The following sections describe various methods to quantify the depreciation.
Straight Line (SL) This method assumes a constant depreciation value per year. Assuming that the price of a depreciating asset is P and its salvage value after N years is S, the annual depreciation is: P
S N
For our example, the annual depreciation would be $20; 000
$5; 000 5
D $3; 000
Sum-of-Years Digits An asset often loses more of its value early in its lifetime. A method that exhibits this dynamic is desirable. Assume an asset depreciates from price P to salvage value S in N years. First compute the sum-of-years as T D 1 C 2 C C N . The depreciation for the years after the asset’s purchase is:
3068 F Chapter 53: Details
Table 53.1
Sum-of-Years General Example
Year Number
Annual Depreciation
first
N T .P
second
N 1 T .P
S/
third
N 2 T .P
S/
:: :
S/
:: : 1 T .P
final
S/
For the i th year of the asset’s use, the annual depreciation is: N C1 T
i
.P
S/
For our example, N D 5 and the sum of years is T D 1 C 2 C 3 C 4 C 5 D 15. The depreciation during the first year is .$20; 000
$5; 000/
5 D $5; 000 15
Table 53.2 describes how Declining Balance would depreciate the asset. Table 53.2
Sum-of-Years Example
Year
Depreciation
Year-End Value
.$20; 000
5 D $5; 000 $5; 000/ 15
$15; 000:00
2
.$20; 000
D $4; 000
$11; 000:00
3
.$20; 000
D $3; 000
$8; 000:00
4
.$20; 000
D $2; 000
$6; 000:00
5
.$20; 000
4 $5; 000/ 15 3 $5; 000/ 15 2 $5; 000/ 15 1 $5; 000/ 15
D $1; 000
$5; 000:00
1
As expected, the value after N years is S.
S DP DP DP
.5 years’ depreciation/ 5 4 .P S / C .P 10 10 .P S /
3 S/ C .P 10
2 S/ C .P 10
1 S/ C .P 10
S/
Declining Balance (DB) F 3069
Declining Balance (DB) Recall that the straight line method assumes a constant depreciation value. Conversely, the declining balance method assumes a constant depreciation rate per year. And like the sum-of-years method, more depreciation tends to occur earlier in the asset’s life. Assume the price of a depreciating asset is P and its salvage value after N years is S. You could assume the asset depreciates by a factor of N1 (or a rate of 100 N %). This method is known as single declining balance. The annual depreciation is: 1 .previous year’s value/ N So for our example, the depreciation during the first year is how declining balance would depreciate the asset. Table 53.3
$20;000 5
D $4; 000. Table 53.3 describes
Declining Balance Example
Year 1 2 3 4 5
Depreciation $20;000:00 5 $16;000:00 5 $12;800:00 5 $10;240:00 5 $12;800:00 5
D $4; 000:00 D $3; 200:00 D $2; 560:00 D $2; 048:00 D $1; 638:40
Year-End Value $16; 000:00 $12; 800:00 $10; 240:00 $8; 192:00 $6; 553:60
DB Factor You could also accelerate the depreciation by increasing the factor (and hence the rate) at which depreciation occurs. Other commonly accepted depreciation rates are 200 N % (called double declining balance as the depreciation factor becomes N2 ) and 150 %. Investment Analysis enables you to N 200 choose between these three types for declining balance: 2 (with N % depreciation), 1.5 (with 150 100 N %), and 1 (with N %).
Declining Balance and the Salvage Value The declining balance method assumes that depreciation is faster earlier in an asset’s life; this is what you wanted. But notice the final value is greater than the salvage value. Even if the salvage value were greater than $6,553.60, the final year-end value would not change. The salvage value never enters the calculation, so there is no way for the salvage value to force the depreciation to assume its value. Newnan and Lavelle (1998) describe two ways to adapt the declining balance method to assume the salvage value at the final time. One way is as follows: Suppose you call the depreciated value after i years V .i/. This sets V .0/ D P and V .N / D S.
3070 F Chapter 53: Details
If V .N / > S according to the usual calculation for V .N /, redefine V .N / to equal S . If V .i / < S according to the usual calculation for V .i/ for some i (and hence for all subsequent V .i / values), you can redefine all such V .i/ to equal S. This alteration to declining balance forces the depreciated value of the asset after N years to be S and keeps V .i / no less than S .
Conversion to SL The second (and preferred) way to force declining balance to assume the salvage value is by conversion to straight line. If V .N / > S, the first way redefines V .N / to equal S; you can think of this as converting to the straight line method for the last timestep. If the V .N / value supplied by DB is appreciably larger than S , then the depreciation in the final year would be unrealistically large. An alternate way is to compute the DB and SL step at each timestep and take whichever step gives a larger depreciation (unless DB drops below the salvage value). After SL assumes a larger depreciation, it continues to be larger over the life of the asset. SL forces the value at the final time to equal the salvage value. As an algorithm, this looks like the following statements: V(0) = P; for i=1 to N if DB step > SL step from (i,V(i)) take a DB step to make V(i); else break; for j = i to N take a SL step to make V(j);
The MACRS, which is discussed in the section that describes the Depreciation Table window, is actually a variation on the declining balance method with conversion to the straight line method.
Rate Information
The Tools Menu Figure 53.5 shows the Tools menu.
Dialog Box Guide F 3071
Figure 53.5 The Tools Menu
The Tools ! Define Rates menu offers the following options. MARR opens the Minumum Attractive Rate of Return (MARR) dialog box. Income Tax Rate opens the Income Tax Specification dialog box. Inflation opens the Inflation Specification dialog box.
Dialog Box Guide
Minimum Attractive Rate of Return (MARR) Selecting Tools ! Define Rates ! MARR from the Investment Analysis dialog box menu bar opens the MARR dialog box that is displayed in Figure 53.6.
3072 F Chapter 53: Details
Figure 53.6 MARR Dialog Box
Name holds the name that you assign to the MARR specification. This must be a valid SAS name. Constant MARR holds the numeric value that you choose to be the constant MARR. This value is used if the MARR List table editor is empty. MARR List holds date-MARR pairs where the date refers to when the particular MARR value begins. Each date is a SAS date. OK returns you to the Investment Analysis dialog box. Clicking it causes the preceding MARR specification to be assumed when you do not specify MARR rates in a dialog box that needs MARR rates. Cancel returns you to the Investment Analysis dialog box, discarding any work that was done in the MARR dialog box.
Income Tax Specification Selecting Tools ! Define Rates ! Income Tax Rate from the Investment Analysis dialog box menu bar opens the Income Tax Specification dialog box displayed in Figure 53.7.
Inflation Specification F 3073
Figure 53.7 Income Tax Specification Dialog Box
Name holds the name you assign to the Income Tax specification. This must be a valid SAS name. Federal Tax holds the numeric value that you want to be the constant Federal Tax. Local Tax holds the numeric value that you want to be the constant Local Tax. Taxrate List holds date-Income Tax triples where the date refers to when the particular Income Tax value begins. Each date is a SAS date, and the value is a percentage between 0% and 100%. OK returns you to the Investment Analysis dialog box. Clicking it causes the preceding income tax specification to be the default income tax rates when using the After Tax Cashflow Calculation dialog box. Cancel returns you to the Investment Analysis dialog box, discarding any changes that were made since this dialog box was opened.
Inflation Specification Selecting Tools ! Define Rates ! Inflation from the Investment Analysis dialog box menu bar opens the Inflation Specification dialog box displayed in Figure 53.8.
3074 F Chapter 53: Details
Figure 53.8 Inflation Specification Dialog Box
Name holds the name that you assign to the Inflation specification. This must be a valid SAS name. Constant Rate holds the numeric value that you want to be the constant inflation rate. This value is used if the Inflation Rate List table editor is empty. Inflation Rate List holds date-rate pairs where the date refers to when the particular inflation rate begins. Each date is a SAS date and the rate is a percentage between 0% and 120%. OK returns you to the Investment Analysis dialog box. Clicking it causes the preceding inflation specification to be assumed when you use the Constant Dollar Calculation dialog box and do not specify inflation rates. Cancel returns you to the Investment Analysis dialog box, discarding any changes that were made since this dialog box was opened.
Reference Newnan, Donald G. and Lavelle, Jerome P. (1998), Engineering Economic Analysis, Austin, Texas: Engineering Press.
Subject Index @CRSPDB Date Informats SASECRSP engine, 2417 @CRSPDR Date Informats SASECRSP engine, 2417 @CRSPDT Date Informats SASECRSP engine, 2417 2SLS estimation method, see two-stage least squares 3SLS estimation method, see three-stage least squares add factors, see adjustments additive model ARIMA model, 215 additive Winters method seasonal forecasting, 846 additive-invertible region smoothing weights, 2899 ADDWINTERS method FORECAST procedure, 846 ADF test, 356 adjacency graph MODEL procedure, 1227 adjust operators, 799 adjustable rate mortgage, see LOAN procedure LOAN procedure, 872 adjusted R squared MODEL procedure, 1077 adjusted R-square statistics of fit, 2012, 2917 adjustments, 2782, 2895 add factors, 2750 forecasting models, 2750 specifying, 2750 After Tax Cashflow Calculation, 3036 AGGREGATE method EXPAND procedure, 785 aggregation of time series data, 765, 768 aggregation of time series EXPAND procedure, 765, 768 AIC, see Akaike information criterion, see Akaike’s information criterion Akaike Information Criterion VARMAX procedure, 2148 Akaike information criterion AIC, 254 ARIMA procedure, 254
AUTOREG procedure, 383 used to select state space models, 1739 Akaike information criterion corrected AUTOREG procedure, 383 Akaike’s information criterion AIC, 2917 statistics of fit, 2917 alignment of dates, 2810 time intervals, 130 alignment of dates, 146, 2810 Almon lag polynomials, see polynomial distributed lags MODEL procedure, 1152 alternatives to DIF function, 107 LAG function, 107 Amemiya’s prediction criterion statistics of fit, 2917 Amemiya’s R-square statistics of fit, 2012, 2917 AMO, 59 amortization schedule LOAN procedure, 900 analyzing models MODEL procedure, 1222 and goal seeking ordinary differential equations (ODEs), 1125 and state space models stationarity, 1719 and tests for autocorrelation lagged dependent variables, 331 and the OUTPUT statement output data sets, 83 Annuity, see Uniform Periodic Equivalent AR initial conditions conditional least squares, 1141 Hildreth-Lu, 1141 maximum likelihood, 1141 unconditional least squares, 1141 Yule-Walker, 1141 ARCH model AUTOREG procedure, 319 autoregressive conditional heteroscedasticity, 319 ARIMA model additive model, 215 ARIMA procedure, 194
3076 F Subject Index
autoregressive integrated moving-average model, 194, 2908 Box-Jenkins model, 194 factored model, 216 multiplicative model, 216 notation for, 210 seasonal model, 215 simulating, 2788, 2882 subset model, 215 ARIMA model specification, 2785, 2817 ARIMA models forecasting models, 2693 specifying, 2693 ARIMA procedure Akaike information criterion, 254 ARIMA model, 194 ARIMAX model, 194, 216 ARMA model, 194 autocorrelations, 197 autoregressive parameters, 259 BY groups, 231 conditional forecasts, 261 confidence limits, 260 correlation plots, 197 cross-correlation function, 244 data requirements, 224 differencing, 213, 251, 257 factored model, 216 finite memory forecasts, 261 forecasting, 260, 263 Gauss-Marquardt method, 253 ID variables, 263 infinite memory forecasts, 261 input series, 216 interaction effects, 221 intervention model, 216, 219, 222, 301 inverse autocorrelation function, 243 invertibility, 259 Iterative Outlier Detection, 310 log transformations, 262 Marquardt method, 253 Model Identification, 303 moving-average parameters, 259 naming model parameters, 259 ODS graph names, 279 ODS Graphics, 228 Outlier Detection, 308 output data sets, 265, 267, 270, 272 output table names, 275 predicted values, 260 prewhitening, 250, 251 printed output, 273 rational transfer functions, 222
regression model with ARMA errors, 216, 218 residuals, 260 Schwarz Bayesian criterion, 254 seasonal model, 215 stationarity, 198 subset model, 215 syntax, 224 time intervals, 263 transfer function model, 216, 221, 254 unconditional forecasts, 261 ARIMA process specification, 2788 ARIMAX model ARIMA procedure, 194, 216 ARIMAX models and design matrix, 221 ARMA model ARIMA procedure, 194 autoregressive moving-average model, 194 MODEL procedure, 1138 notation for, 210 as time ID observation numbers, 2671 asymptotic distribution of impulse response functions VARMAX procedure, 2135, 2143 asymptotic distribution of the parameter estimation VARMAX procedure, 2143 at annual rates percent change calculations, 109 at sign (@) operator COUNTREG procedure, 533 attributes DATASOURCE procedure, 564 attributes of variables DATASOURCE procedure, 589 audit trail, 2810 Augmented Dickey-Fuller (ADF) test, 356 augmented Dickey-Fuller tests, 234, 250 autocorrelation, 1108 autocorrelation tests, 1108 Durbin-Watson test, 353 Godfrey Lagrange test, 1108 Godfrey’s test, 353 autocorrelations ARIMA procedure, 197 multivariate, 1721 plotting, 197 prediction errors, 2658 series, 2723 automatic forecasting FORECAST procedure, 818 STATESPACE procedure, 1716
Subject Index F 3077
automatic generation forecasting models, 2624 automatic inclusion of interventions, 2810 automatic model selection criterion, 2838 options, 2796 automatic selection forecasting models, 2687 AUTOREG procedure Akaike information criterion, 383 Akaike information criterion corrected, 383 ARCH model, 319 autoregressive error correction, 320 BY groups, 347 Cholesky root, 372 Cochrane-Orcutt method, 374 conditional variance, 407 confidence limits, 369 dual quasi-Newton method, 381 Durbin h test, 331 Durbin t test, 331 Durbin-Watson test, 329 EGARCH model, 342 EGLS method, 374 estimation methods, 370 factored model, 334 GARCH model, 319 GARCH-M model, 342 Gauss-Marquardt method, 373 generalized Durbin-Watson tests, 329 Hannan-Quinn information criterion, 383 heteroscedasticity, 334 Hildreth-Lu method, 375 IGARCH model, 342 Kalman filter, 373 lagged dependent variables, 331 maximum likelihood method, 375 nonlinear least-squares, 375 ODS graph names, 415 output data sets, 410 output table names, 413 PGARCH model, 342 Prais-Winsten estimates, 375 predicted values, 369, 405, 406 printed output, 412 QGARCH model, 342 quasi-Newton method, 350 random walk model, 425 residuals, 369 Schwarz Bayesian criterion, 383 serial correlation correction, 320 stepwise autoregression, 332 structural predictions, 405
subset model, 334 TGARCH model, 342 Toeplitz matrix, 371 trust region method, 350 two-step full transform method, 374 Yule-Walker equations, 371 Yule-Walker estimates, 371 autoregressive conditional heteroscedasticity, see ARCH model autoregressive error correction AUTOREG procedure, 320 autoregressive integrated moving-average model, see ARIMA model, see ARIMA model autoregressive models FORECAST procedure, 840 MODEL procedure, 1138 autoregressive moving-average model, see ARMA model autoregressive parameters ARIMA procedure, 259 auxiliary data sets DATASOURCE procedure, 564 auxiliary equations, 1124 MODEL procedure, 1124 balance of payment statistics data files, see DATASOURCE procedure balloon payment mortgage, see LOAN procedure LOAN procedure, 872 bandwidth functions, 1062 bar (|) operator COUNTREG procedure, 532 Base SAS software, 49 Basmann test SYSLIN procedure, 1787, 1802 batch mode, see unattended mode Bayesian vector autoregressive models VARMAX procedure, 2096, 2139 BDS test, 351, 396 BDS test for Independence, 351, 396 BEA data files, see DATASOURCE procedure BEA national income and product accounts PC Format DATASOURCE procedure, 634 BEA S-pages, see DATASOURCE procedure Benefit-Cost Ratio Analysis, 3051 between between estimators, 1339 Between Estimators PANEL procedure, 1339 between estimators, 1339 between, 1339 between levels and rates
3078 F Subject Index
interpolation, 123 between stocks and flows interpolation, 123 BIC, see Schwarz Bayesian information criterion block structure MODEL procedure, 1227 BLS consumer price index surveys DATASOURCE procedure, 635 BLS data files, see DATASOURCE procedure BLS national employment, hours, and earnings survey DATASOURCE procedure, 636 BLS producer price index survey DATASOURCE procedure, 635 BLS state and area employment, hours, and earnings survey DATASOURCE procedure, 637 Bond, 3004 BOPS data file DATASOURCE procedure, 653 boundaries smoothing weights, 2899 bounds on parameter estimates, 688, 1024, 1432 BOUNDS statement, 688, 1024, 1432 Box Cox transformations, 2895 Box Cox transformation, see transformations Box-Cox transformation BOXCOXAR macro, 154 Box-Jenkins model, see ARIMA model BOXCOXAR macro Box-Cox transformation, 154 output data sets, 155 SAS macros, 154 break even analysis LOAN procedure, 896 Breakeven Analysis, 3053 Breusch-Pagan test, 1100 heteroscedasticity tests, 1100 Brown smoothing model, see double exponential smoothing Bureau of Economic Analysis data files, see DATASOURCE procedure Bureau of Labor Statistics data files, see DATASOURCE procedure buydown rate loans, see LOAN procedure LOAN procedure, 872 BY groups ARIMA procedure, 231 AUTOREG procedure, 347 COUNTREG procedure, 525 cross-sectional dimensions and, 79 ESM procedure, 733 EXPAND procedure, 775
FORECAST procedure, 838 MDC procedure, 936 PANEL procedure, 1320 PDLREG procedure, 1402 SEVERITY procedure, 1514 SIMILARITY procedure, 1598 SIMLIN procedure, 1664 SPECTRA procedure, 1694 STATESPACE procedure, 1734 SYSLIN procedure, 1785 TIMESERIES procedure, 1860 TSCSREG procedure, 1927 UCM procedure, 1952 X11 procedure, 2240 X12 procedure, 2311 BY groups and time series cross-sectional form, 79 calculation of leads, 111 calculations smoothing models, 2897 calendar calculations functions for, 94, 147 interval functions and, 103 time intervals and, 103 calendar calculations and INTCK function, 103 INTNX function, 103 time intervals, 103 calendar functions and date values, 95 calendar variables, 94 computing dates from, 95 computing from dates, 95 computing from datetime values, 97 canonical correlation analysis for selection of state space models, 1718, 1741 STATESPACE procedure, 1718, 1741 Cashflow, see Generic Cashflow CATALOG procedure, 49 SAS catalogs, 49 Cauchy distribution estimation example, 1267 examples, 1267 CDT (COMPUTAB data table) COMPUTAB procedure, 488 ceiling of time intervals, 101 Censored Regression Models QLIM procedure, 1446 Census X-11 method, see X11 procedure Census X-11 methodology
Subject Index F 3079
X11 procedure, 2253 Census X-12 method, see X12 procedure Center for Research in Security Prices data files, see DATASOURCE procedure centered moving time window operators, 790, 791 change vector, 1077 changes in trend forecasting models, 2758 changing by interpolation frequency, 122, 765, 778 periodicity, 122, 765 sampling frequency, 122 changing periodicity EXPAND procedure, 122 time series data, 122, 765 character functions, 51 character variables MODEL procedure, 1204 CHART procedure, 49 histograms, 49 checking data periodicity INTNX function, 102 time intervals, 102 Chirp-Z algorithm SPECTRA procedure, 1696 choice of instrumental variables, 1134 Cholesky root AUTOREG procedure, 372 Chow test, 352, 354, 404 Chow test for structural change, 352 Chow tests, 1131 MODEL procedure, 1131 CITIBASE format DATASOURCE procedure, 567 CITIBASE old format DATASOURCE procedure, 638 CITIBASE PC format DATASOURCE procedure, 639 classical decomposition operators, 794 classification variables COUNTREG procedure, 531 Cochrane-Orcutt method AUTOREG procedure, 374 coherency cross-spectral analysis, 1701 coherency of cross-spectrum SPECTRA procedure, 1701 cointegration VARMAX procedure, 2150 cointegration test, 355, 390 cointegration testing VARMAX procedure, 2094, 2154
collinearity diagnostics MODEL procedure, 1082, 1091 column blocks COMPUTAB procedure, 489 column selection COMPUTAB procedure, 486, 487 COLxxxxx: label COMPUTAB procedure, 480 combination models forecasting models, 2710 specifying, 2710 combined seasonality test, 2270, 2340 combined with cross-sectional dimension interleaved time series, 81 combined with interleaved time series cross-sectional dimensions, 81 combining forecasts, 2819, 2894 combining time series data sets, 117 Command Reference, 2773 common trends VARMAX procedure, 2150 common trends testing VARMAX procedure, 2096, 2151 COMPARE procedure, 49 comparing SAS data sets, 49 comparing forecasting models, 2733, 2833 comparing forecasting models, 2733, 2833 comparing loans LOAN procedure, 879, 896, 900 comparing SAS data sets, see COMPARE procedure compiler listing MODEL procedure, 1220 COMPUSTAT data files, see DATASOURCE procedure DATASOURCE procedure, 639 COMPUSTAT IBM 360/370 general format 48 quarter files DATASOURCE procedure, 641 COMPUSTAT IBM 360/370 general format annual files DATASOURCE procedure, 640 COMPUSTAT universal character format 48 quarter files DATASOURCE procedure, 643 COMPUSTAT universal character format annual files DATASOURCE procedure, 642 COMPUTAB procedure CDT (COMPUTAB data table), 488 column blocks, 489 column selection, 486, 487 COLxxxxx: label, 480
3080 F Subject Index
consolidation tables, 480 controlling row and column block execution, 487 input block, 489 missing values, 491 order of calculations, 485 output data sets, 491 program flow, 482 programming statements, 479 reserved words, 490 row blocks, 490 ROWxxxxx: label, 480 table cells, direct access to, 490 computational details VARMAX procedure, 2192 computing calendar variables from datetime values, 97 computing ceiling of intervals INTNX function, 101 computing dates from calendar variables, 95 computing datetime values from date values, 96 computing ending date of intervals INTNX function, 100 computing from calendar variables datetime values, 96 computing from dates calendar variables, 95 computing from datetime values calendar variables, 97 date values, 96 time variables, 97 computing from time variables datetime values, 96 computing lags RETAIN statement, 107 computing midpoint date of intervals INTNX function, 100 computing time variables from datetime values, 97 computing widths of intervals INTNX function, 100 concatenated data set, 2638 concentrated likelihood Hessian, 1071 conditional forecasts ARIMA procedure, 261 conditional least squares AR initial conditions, 1141 MA Initial Conditions, 1142 conditional logit model MDC procedure, 914, 915, 950 conditional t distribution
GARCH model, 380 conditional variance AUTOREG procedure, 407 predicted values, 407 predicting, 407 confidence limits, 2664 ARIMA procedure, 260 AUTOREG procedure, 369 FORECAST procedure, 851 forecasts, 2664 PDLREG procedure, 1406 STATESPACE procedure, 1749 VARMAX procedure, 2179 consolidation tables COMPUTAB procedure, 480 Constant Dollar Calculation, 3040 constrained estimation heteroscedasticity models, 363 Consumer Price Index Surveys, see DATASOURCE procedure contemporaneous correlation of errors across equations, 1797 contents of SAS data sets, 49 CONTENTS procedure, 49 SASECRSP engine, 2402 SASEFAME engine, 2501 continuous compounding LOAN procedure, 894 continuous variables, 531 contrasted with flow variables stocks, 768 contrasted with flows or rates levels, 768 contrasted with missing values omitted observations, 78 contrasted with omitted observations missing observations, 78 missing values, 78 contrasted with stock variables flows, 768 contrasted with stocks or levels rates, 768 control charts, 56 control key for multiple selections, 2626 control variables MODEL procedure, 1202 controlling row and column block execution COMPUTAB procedure, 487 controlling starting values MODEL procedure, 1084 convergence criteria MODEL procedure, 1078
Subject Index F 3081
convergence problems VARMAX procedure, 2192 conversion methods EXPAND procedure, 783 convert option SASEFAME engine, 2501 Converting Dates Using the CRSP Date Functions SASECRSP engine, 2416 converting frequency of time series data, 765 COPY procedure, 49 copying SAS data sets, 49 CORR procedure, 49 corrected sum of squares statistics of fit, 2917 correlation plots ARIMA procedure, 197 cospectrum estimate cross-spectral analysis, 1701 SPECTRA procedure, 1701 counting time intervals, 98, 101 counting time intervals INTCK function, 101 COUNTREG procedure bounds on parameter estimates, 525 BY groups, 525 output table names, 547 restrictions on parameter estimates, 529 syntax, 521 covariance estimates GARCH model, 353 Covariance of GMM estimators, 1065 covariance of the parameter estimates, 1057 covariance stationarity VARMAX procedure, 2174 covariates heteroscedasticity models, 362, 1437 CPORT procedure, 49 CPU requirements VARMAX procedure, 2193 creating time ID variable, 2667 creating a Fame view, see SASEFAME engine creating a Haver view, see SASEHAVR engine creating from Model Viewer HTML, 2846 creating from Time Series Viewer HTML, 2886 criterion automatic model selection, 2838 cross sectional dimensions represented by different series, 79
cross sections DATASOURCE procedure, 571, 573, 576, 587 cross-correlation function ARIMA procedure, 244 cross-equation covariance matrix MODEL procedure, 1076 seemingly unrelated regression, 1060 cross-periodogram cross-spectral analysis, 1690, 1701 SPECTRA procedure, 1701 cross-reference MODEL procedure, 1219 cross-sectional dimensions, 79 combined with interleaved time series, 81 ID variables for, 79 represented with BY groups, 79 transposing time series, 119 cross-sectional dimensions and BY groups, 79 cross-spectral analysis coherency, 1701 cospectrum estimate, 1701 cross-periodogram, 1690, 1701 cross-spectrum, 1701 quadrature spectrum, 1701 SPECTRA procedure, 1690, 1701 cross-spectrum cross-spectral analysis, 1701 SPECTRA procedure, 1701 crossproducts estimator of the covariance matrix, 1071 crossproducts matrix, 1093 crosstabulations, see FREQ procedure CRSP and SAS Dates SASECRSP engine, 2416 CRSP annual data DATASOURCE procedure, 649 CRSP calendar/indices files DATASOURCE procedure, 645 CRSP daily binary files DATASOURCE procedure, 644 CRSP daily character files DATASOURCE procedure, 644 CRSP daily IBM binary files DATASOURCE procedure, 644 CRSP daily security files DATASOURCE procedure, 646 CRSP data files, see DATASOURCE procedure CRSP Date Formats SASECRSP engine, 2416 CRSP Date Functions SASECRSP engine, 2416 CRSP Date Informats
3082 F Subject Index
SASECRSP engine, 2417 CRSP Integer Date Format SASECRSP engine, 2416 CRSP monthly binary files DATASOURCE procedure, 644 CRSP monthly character files DATASOURCE procedure, 644 CRSP monthly IBM binary files DATASOURCE procedure, 644 CRSP monthly security files DATASOURCE procedure, 647 CRSP stock files DATASOURCE procedure, 644 CRSPAccess Database DATASOURCE procedure, 644 CRSPDB_SASCAL environment variable SASECRSP engine, 2401 CRSPDCI Date Functions SASECRSP engine, 2418 CRSPDCS Date Functions SASECRSP engine, 2418 CRSPDI2S Date Function SASECRSP engine, 2418 CRSPDIC Date Functions SASECRSP engine, 2418 CRSPDS2I Date Function SASECRSP engine, 2418 CRSPDSC Date Functions SASECRSP engine, 2418 CRSPDT Date Formats SASECRSP engine, 2416 cubic trend curves, 2912 cubic trend, 2912 cumulative statistics operators, 792 Currency Conversion, 3038 custom model specification, 2797 custom models forecasting models, 2700 specifying, 2700 CUSUM statistics, 368, 382 Da Silva method PANEL procedure, 1350 damped-trend exponential smoothing, 2903 smoothing models, 2903 data frequency, see time intervals data periodicity FORECAST procedure, 839 data requirements ARIMA procedure, 224 FORECAST procedure, 850 X11 procedure, 2258 data set, 2633
concatenated, 2638 forecast data set, 2634 forms of, 2634 interleaved, 2636 simple, 2635 data set selection, 2619, 2801 DATA step, 49 SAS data sets, 49 DATASETS procedure, 49 DATASOURCE procedure attributes, 564 attributes of variables, 589 auxiliary data sets, 564 balance of payment statistics data files, 564 BEA data files, 564 BEA national income and product accounts PC Format, 634 BEA S-pages, 564 BLS consumer price index surveys, 635 BLS data files, 564 BLS national employment, hours, and earnings survey, 636 BLS producer price index survey, 635 BLS state and area employment, hours, and earnings survey, 637 BOPS data file, 653 Bureau of Economic Analysis data files, 564 Bureau of Labor Statistics data files, 564 Center for Research in Security Prices data files, 564 CITIBASE format, 567 CITIBASE old format, 638 CITIBASE PC format, 639 COMPUSTAT data files, 564, 639 COMPUSTAT IBM 360/370 general format 48 quarter files, 641 COMPUSTAT IBM 360/370 general format annual files, 640 COMPUSTAT universal character format 48 quarter files, 643 COMPUSTAT universal character format annual files, 642 Consumer Price Index Surveys, 564 cross sections, 571, 573, 576, 587 CRSP annual data, 649 CRSP calendar/indices files, 645 CRSP daily binary files, 644 CRSP daily character files, 644 CRSP daily IBM binary files, 644 CRSP daily security files, 646 CRSP data files, 564 CRSP monthly binary files, 644 CRSP monthly character files, 644 CRSP monthly IBM binary files, 644
Subject Index F 3083
CRSP monthly security files, 647 CRSP stock files, 644 CRSPAccess Database, 644 direction of trade statistics data files, 564 DOTS data file, 653 DRI Data Delivery Service data files, 564 DRI data files, 564, 637 DRI/McGraw-Hill data files, 564, 637 DRIBASIC data files, 638 DRIBASIC economics format, 567 DRIDDS data files, 638 employment, hours, and earnings survey, 564 event variables, 586, 587, 593 FAME data files, 564 FAME Information Services Databases, 564, 649 formatting variables, 589 frequency of data, 568 frequency of input data, 583 generic variables, 594 GFS data files, 654 Global Insight data files, 564, 637, 638 Global Insight DRI data files, 637 government finance statistics data files, 564 Haver Analytics data files, 651 ID variable, 593 IMF balance of payment statistics, 653 IMF data files, 564 IMF direction of trade statistics, 653 IMF Economic Information System data files, 652 IMF government finance statistics, 654 IMF International Financial Statistics, 571 IMF international financial statistics, 652 indexing the OUT= data set, 582, 625 input file, 582, 583 international financial statistics data files, 564 International Monetary Fund data files, 564, 652 labeling variables, 590 lengths of variables, 578, 590 main economic indicators (OECD) data files, 564 national accounts data files (OECD), 564 national income and product accounts, 564, 634 NIPA Tables, 634 obtaining descriptive information, 569, 573–575, 594–597 OECD ANA data files, 654 OECD annual national accounts, 654 OECD data files, 564 OECD main economic indicators, 656 OECD MEI data files, 656
OECD QNA data files, 655 OECD quarterly national accounts, 655 Organization for Economic Cooperation and Development data files, 564, 654 OUTALL= data set, 574 OUTBY= data set, 573 OUTCONT= data set, 569, 575 output data sets, 567, 592, 594–597 Producer Price Index Survey, 564 reading data files, 567 renaming variables, 576, 591 SAS YEARCUTOFF= option, 588 state and area employment, hours, and earnings survey, 564 stock data files, 564 subsetting data files, 567, 580 time range, 588 time range of data, 570 time series variables, 568, 593 type of input data file, 582 U.S. Bureau of Economic Analysis data files, 634 U.S. Bureau of Labor Statistics data files, 635 variable list, 591 DATE ID variables, 71 date values, 2617 calendar functions and, 95 computing datetime values from, 96 computing from datetime values, 96 difference between dates, 100 formats, 70, 142 formats for, 70 functions, 147 incrementing by intervals, 98 informats, 69, 140 informats for, 69 INTNX function and, 98 normalizing to intervals, 100 SAS representation for, 68 syntax for, 68 time intervals, 128 time intervals and, 99 DATE variable, 71 dates alignment of, 2810 DATETIME ID variables, 71 datetime values computing calendar variables from, 97 computing from calendar variables, 96 computing from time variables, 96 computing time variables from, 97 formats, 70, 146
3084 F Subject Index
formats for, 70 functions, 147 informats, 69, 140 informats for, 69 SAS representation for, 69 syntax for, 69 time intervals, 128 DATETIME variable, 71 dating variables, 2674 decomposition of prediction error covariance VARMAX procedure, 2089, 2125 default time ranges, 2803 defined INTCK function, 98 interleaved time series, 80 INTNX function, 97 omitted observations, 78 time values, 69 definition S matrix, 1058 time series, 2608 degrees of freedom correction, 1076 denominator factors transfer function model, 222 dependency list MODEL procedure, 1223 Depreciation, 3001 derivatives MODEL procedure, 1207 DERT. variable, 1117 descriptive statistics, see UNIVARIATE procedure design matrix ARIMAX models and, 221 details generalized method of moments, 1061 developing forecasting models, 2645, 2804 developing forecasting models, 2645, 2804 DFPVALUE macro Dickey-Fuller test, 157 SAS macros, 157 DFTEST macro Dickey-Fuller test, 158 output data sets, 159 SAS macros, 158 seasonality, testing for, 158 stationarity, testing for, 158 diagnostic tests, 2681, 2915 time series, 2681 diagnostics and debugging MODEL procedure, 1217 Dickey-Fuller test, 2916 DFPVALUE macro, 157 DFTEST macro, 158
PROBDF Function, 162 significance probabilities, 162 significance probabilities for, 157 unit root, 162 VARMAX procedure, 2094 Dickey-Fuller tests, 234 DIF function alternatives to, 107 explained, 105 higher order differences, 108 introduced, 104 MODEL procedure version, 107 multiperiod lags and, 108 percent change calculations and, 109, 110 pitfalls of, 106 second difference, 108 DIF function and differencing, 104–106 difference between dates date values, 100 differences with X11ARIMA/88 X11 procedure, 2252 Differencing, 2812 differencing ARIMA procedure, 213, 251, 257 DIF function and, 104–106 higher order, 108 MODEL procedure and, 107 multiperiod differences, 108 percent change calculations and, 109, 110 RETAIN statement and, 107 second difference, 108 STATESPACE procedure, 1735 testing order of, 158 time series data, 104–110 VARMAX procedure, 2086 different forms of output data sets, 82 differential algebraic equations ordinary differential equations (ODEs), 1197 differential equations See ordinary differential equations, 1120 direction of trade statistics data files, see DATASOURCE procedure discrete variables, see classification variables discussed EXPAND procedure, 121 distributed lag regression models PDLREG procedure, 1395 distribution of time series, 768 distribution of time series data, 768 distribution of time series
Subject Index F 3085
EXPAND procedure, 768 DOT as a GLUE character SASEFAME engine, 2507 DOTS data file DATASOURCE procedure, 653 double exponential smoothing, see exponential smoothing, 2901 Brown smoothing model, 2901 smoothing models, 2901 DRI Data Delivery Service data files, see DATASOURCE procedure DRI data files, see DATASOURCE procedure DATASOURCE procedure, 637 DRI data files in FAME.db, see SASEFAME engine DRI/McGraw-Hill data files, see DATASOURCE procedure DATASOURCE procedure, 637 DRI/McGraw-Hill data files in FAME.db, see SASEFAME engine DRIBASIC data files DATASOURCE procedure, 638 DRIBASIC economics format DATASOURCE procedure, 567 DRIDDS data files DATASOURCE procedure, 638 DROP in the DATA step SASEFAME engine, 2517 dual quasi-Newton method AUTOREG procedure, 381 Durbin h test AUTOREG procedure, 331 Durbin t test AUTOREG procedure, 331 Durbin-Watson MODEL procedure, 1075 Durbin-Watson test autocorrelation tests, 353 AUTOREG procedure, 329 for first-order autocorrelation, 329 for higher-order autocorrelation, 329 p-values for, 329 Durbin-Watson tests, 353 linearized form, 361 dynamic models SIMLIN procedure, 1660, 1661, 1667, 1682 dynamic multipliers SIMLIN procedure, 1667, 1668 dynamic regression, 194, 216, 2813, 2814 specifying, 2751 dynamic regressors forecasting models, 2751 dynamic simulation, 1118 MODEL procedure, 1118, 1167
SIMLIN procedure, 1661 dynamic simultaneous equation models VARMAX procedure, 2108 econometrics features in SAS/ETS software, 23 editing selection list forecasting models, 2706 EGARCH model AUTOREG procedure, 342 EGLS method AUTOREG procedure, 374 embedded in time series missing values, 78 embedded missing values, 78 embedded missing values in time series data, 78 Empirical Distribution Estimation MODEL procedure, 1073 employment, hours, and earnings survey, see DATASOURCE procedure ending dates of time intervals, 100 endogenous variables SYSLIN procedure, 1764 endpoint restrictions for polynomial distributed lags, 1396, 1402 Engle’s Lagrange Multiplier test, 403 Engle’s Lagrange Multiplier test for Heteroscedasticity, 403 Enterprise Guide, 58 Enterprise Miner—Time Series nodes, 59 ENTROPY procedure input data sets, 707 missing values, 706 ODS graph names, 710 output data sets, 708 output table names, 709 Environment variable, CRSPDB_SASCAL SASECRSP engine, 2401 EQ. variables, 1109, 1204 equality restriction linear models, 692 nonlinear models, 1049, 1126 equation translations MODEL procedure, 1204 equation variables MODEL procedure, 1201 Error model options, 2815 error sum of squares statistics of fit, 2917 ERROR. variables, 1204 errors across equations contemporaneous correlation of, 1797
3086 F Subject Index
ESACF (Extended Sample Autocorrelation Function method), 245 ESM procedure BY groups, 733 ODS graph names, 749 EST= data set SIMLIN procedure, 1669 ESTIMATE statement, 1031 estimation convergence problems MODEL procedure, 1088 estimation methods AUTOREG procedure, 370 MODEL procedure, 1057 estimation of ordinary differential equations, 1120 MODEL procedure, 1120 evaluation range, 2878 event variables DATASOURCE procedure, 586, 587, 593 example Cauchy distribution estimation, 1267 generalized method of moments, 1104, 1155, 1158–1160 Goldfeld Quandt Switching Regression Model, 1269 Mixture of Distributions, 1273 Multivariate Mixture of Distributions, 1273 ordinary differential equations (ODEs), 1263 The D-method, 1269 example of Bayesian VAR modeling VARMAX procedure, 2058 example of Bayesian VECM modeling VARMAX procedure, 2065 example of causality testing VARMAX procedure, 2073 example of cointegration testing VARMAX procedure, 2061 example of multivariate GARCH modeling VARMAX procedure, 2175 example of restricted parameter estimation and testing VARMAX procedure, 2071 example of VAR modeling VARMAX procedure, 2051 example of VARMA modeling VARMAX procedure, 2144 example of vector autoregressive modeling with exogenous variables VARMAX procedure, 2066 example of vector error correction modeling VARMAX procedure, 2060 example, COUNTREG, 548 examples Cauchy distribution estimation, 1267 Monte Carlo simulation, 1266
Simulating from a Mixture of Distributions, 1273 Switching Regression example, 1269 systems of differential equations, 1263 examples of time intervals, 134 exogenous variables SYSLIN procedure, 1764 EXPAND procedure AGGREGATE method, 785 aggregation of time series, 765, 768 BY groups, 775 changing periodicity, 122 conversion methods, 783 discussed, 121 distribution of time series, 768 extrapolation, 781 frequency, 765 ID variables, 777, 779 interpolation methods, 783 interpolation of missing values, 122 JOIN method, 784 ODS graph names, 803 output data sets, 801 range of output observations, 780 SPLINE method, 783 STEP method, 785 time intervals, 779 transformation of time series, 770, 786 transformation operations, 786 EXPAND procedure and interpolation, 121 time intervals, 122 experimental design, 56 explained DIF function, 105 LAG function, 105 explosive differential equations, 1197 ordinary differential equations (ODEs), 1197 exponential trend curves, 2913 exponential smoothing, see smoothing models double exponential smoothing, 842 FORECAST procedure, 818, 842 single exponential smoothing, 842 triple exponential smoothing, 842 exponential trend, 2913 Extended Sample Autocorrelation Function (ESACF) method, 245 external forecasts, 2894 external forecasts, 2894 external sources forecasting models, 2713, 2816
Subject Index F 3087
extrapolation EXPAND procedure, 781 Factored ARIMA, 2783, 2812, 2851 Factored ARIMA model specification, 2817 Factored ARIMA models forecasting models, 2696 specifying, 2696 factored model ARIMA model, 216 ARIMA procedure, 216 AUTOREG procedure, 334 FAME data files, see DATASOURCE procedure Fame data files, see SASEFAME engine Fame glue symbol named DOT SASEFAME engine, 2512 FAME Information Services Databases, see DATASOURCE procedure DATASOURCE procedure, 649 Fame Information Services Databases, see SASEFAME engine fast Fourier transform SPECTRA procedure, 1696 fatal error when reading from a Fame data base SASEFAME engine, 2501 FCMP procedure, 49 SAS functions, 49 features in SAS/ETS software econometrics, 23 FIML estimation method, see full information maximum likelihood Financial Functions PROBDF Function, 162 financial functions, 51 finishing the Fame CHLI SASEFAME engine, 2501 finite Fourier transform SPECTRA procedure, 1690 finite memory forecasts ARIMA procedure, 261 first-stage R squares, 1137 fitting forecasting models, 2648 fitting forecasting models, 2648 fixed effects model one-way, 1332 two-way, 1333 fixed rate mortgage, see LOAN procedure LOAN procedure, 872 flows contrasted with stock variables, 768 for first-order autocorrelation Durbin-Watson test, 329 for higher-order autocorrelation
Durbin-Watson test, 329 for interleaved time series ID variables, 80 for multiple selections control key, 2626 for nonlinear models instrumental variables, 1134 for selection of state space models canonical correlation analysis, 1718, 1741 for time series data ID variables, 67 forecast combination, 2819, 2894 FORECAST command, 2774 forecast data set, see output data set forecast horizon, 2803, 2878 forecast options, 2823 FORECAST procedure ADDWINTERS method, 846 automatic forecasting, 818 autoregressive models, 840 BY groups, 838 confidence limits, 851 data periodicity, 839 data requirements, 850 exponential smoothing, 818, 842 forecasting, 818 Holt two-parameter exponential smoothing, 818, 847 ID variables, 839 missing values, 839 output data sets, 850, 852 predicted values, 851 residuals, 851 seasonal forecasting, 843, 846 seasonality, 848 smoothing weights, 847 STEPAR method, 840 stepwise autoregression, 818, 840 time intervals, 839 time series methods, 830 time trend models, 828 Winters method, 818, 843 FORECAST procedure and interleaved time series, 80, 81 Forecast Studio, 51 forecasting, 2892 ARIMA procedure, 260, 263 FORECAST procedure, 818 MODEL procedure, 1169 STATESPACE procedure, 1716, 1745 VARMAX procedure, 2122 Forecasting menusystem, 46 forecasting models adjustments, 2750
3088 F Subject Index
ARIMA models, 2693 automatic generation, 2624 automatic selection, 2687 changes in trend, 2758 combination models, 2710 comparing, 2733, 2833 custom models, 2700 developing, 2645, 2804 dynamic regressors, 2751 editing selection list, 2706 external sources, 2713, 2816 Factored ARIMA models, 2696 fitting, 2648 interventions, 2755 level shifts, 2760 linear trend, 2742 predictor variables, 2739 reference, 2736 regressors, 2747 seasonal dummy variables, 2767 selecting from a list, 2685 smoothing models, 2690, 2897 sorting, 2732, 2809 specifying, 2681 transfer functions, 2910 trend curves, 2743 forecasting of Bayesian vector autoregressive models VARMAX procedure, 2140 forecasting process, 2617 forecasting project, 2638 managing, 2827 Project Management window, 2639 saving and restoring, 2640 sharing, 2644 forecasts, 2665 confidence limits, 2664 external, 2894 plotting, 2664 producing, 2632, 2852 form of state space models, 1716 formats date values, 70, 142 datetime values, 70, 146 recommended for time series ID, 71 time values, 146 formats for date values, 70 datetime values, 70 formatting variables DATASOURCE procedure, 589 forms of data set, 2634
Fourier coefficients SPECTRA procedure, 1701 Fourier transform SPECTRA procedure, 1690 fractional operators, 796 FREQ procedure, 49 crosstabulations, 49 frequency changing by interpolation, 122, 765, 778 EXPAND procedure, 765 of time series observations, 84, 122 SPECTRA procedure, 1700 time intervals and, 84, 122 frequency of data, see time intervals DATASOURCE procedure, 568 frequency of input data DATASOURCE procedure, 583 frequency option SASEHAVR engine, 2556 from interleaved form transposing time series, 117 from standard form transposing time series, 120 full information maximum likelihood FIML estimation method, 1762 MODEL procedure, 1069 SYSLIN procedure, 1772, 1797 Fuller Battese variance components, 1340 Fuller’s modification to LIML SYSLIN procedure, 1802 functions, 51 date values, 147 datetime values, 147 lag functions, 1209 mathematical functions, 1208 random-number functions, 1208 time intervals, 147 time values, 147 functions across time MODEL procedure, 1209 functions for calendar calculations, 94, 147 time intervals, 97, 147 functions of parameters nonlinear models, 1031 G4 inverse, 1035 GARCH in mean model, see GARCH-M model GARCH model AUTOREG procedure, 319 conditional t distribution, 380 covariance estimates, 353
Subject Index F 3089
generalized autoregressive conditional heteroscedasticity, 319 heteroscedasticity models, 362 initial values, 361 starting values, 350 t distribution, 380 GARCH-M model, 380 AUTOREG procedure, 342 GARCH in mean model, 380 Gauss-Marquardt method ARIMA procedure, 253 AUTOREG procedure, 373 Gauss-Newton method, 1077 Gaussian distribution MODEL procedure, 1030 General Form Equations Jacobi method, 1191 Seidel method, 1191 generalized autoregressive conditional heteroscedasticity, see GARCH model generalized Durbin-Watson tests AUTOREG procedure, 329 generalized least squares PANEL procedure, 1348 generalized least squares estimator of the covariance matrix, 1071 generalized least-squares Yule-Walker method as, 374 Generalized Method of Moments V matrix, 1062, 1067 generalized method of moments details, 1061 example, 1104, 1155, 1158–1160 generating models, 2789 Generic Cashflow, 3008 generic variables DATASOURCE procedure, 594 GFS data files DATASOURCE procedure, 654 giving dates to time series data, 67 Global Insight data files DATASOURCE procedure, 637, 638 Global Insight DRI data files, see DATASOURCE procedure DATASOURCE procedure, 637 global statements, 50 GLUE symbol SASEFAME engine, 2507 GMM simulated method of moments, 1066 SMM, 1066 GMM in Panel: Arellano and Bond’s Estimator Panel GMM, 1352
goal seeking MODEL procedure, 1187 goal seeking problems, 1125 Godfrey Lagrange test autocorrelation tests, 1108 Godfrey’s test, 353 autocorrelation tests, 353 Goldfeld Quandt Switching Regression Model example, 1269 goodness of fit, see statistics of fit goodness-of-fit statistics, see statistics of fit, 2872, see statistics of fit government finance statistics data files, see DATASOURCE procedure gradient of the objective function, 1092, 1093 Granger causality test VARMAX procedure, 2136 graphics SAS/GRAPH software, 52 graphs, see Model Viewer, see Time Series Viewer grid search MODEL procedure, 1086 Hannan-Quinn information criterion AUTOREG procedure, 383 Hausman specification test, 1129 MODEL procedure, 1129 Haver Analytics data files DATASOURCE procedure, 651 Haver data files, see SASEHAVR engine Haver Information Services Databases, see SASEHAVR engine HCCME 2SLS, 1107 HCCME 3SLS, 1107 HCCME = hccme=0, 1361 PANEL procedure, 1361 HCCME OLS, 1105 HCCME SUR, 1106 hccme=0 HCCME =, 1361 help system, 22 Henze-Zirkler test, 1098 normality tests, 1098 heteroscedastic errors, 1061 heteroscedastic extreme value model MDC procedure, 925, 952 Heteroscedasticity Engle’s Lagrange Multiplier test for, 403 Lee and King’s test for, 403 Portmanteau Q test for, 402 Wong and Li’s test for, 404 heteroscedasticity, 997, 1100 AUTOREG procedure, 334
3090 F Subject Index
Lagrange multiplier test, 364 testing for, 334 Heteroscedasticity Corrected Covariance Matrices, 1361 heteroscedasticity models, see GARCH model constrained estimation, 363 covariates, 362, 1437 link function, 363 heteroscedasticity tests Breusch-Pagan test, 1100 Lagrange multiplier test, 364 White’s test, 1100 Heteroscedasticity-Consistent Covariance Matrix Estimation , 1105 higher order differencing, 108 higher order differences DIF function, 108 higher order sums summation, 113 Hildreth-Lu AR initial conditions, 1141 Hildreth-Lu method AUTOREG procedure, 375 histograms, see CHART procedure hold-out sample, 2803 hold-out samples, 2736 Holt smoothing model, see linear exponential smoothing Holt two-parameter exponential smoothing FORECAST procedure, 818, 847 Holt-Winters Method, see Winters Method Holt-Winters method, see Winters method homoscedastic errors, 1100 HTML creating from Model Viewer, 2846 creating from Time Series Viewer, 2886 hyperbolic trend curves, 2913 hyperbolic trend, 2913 ID groups MDC procedure, 936 ID values for time intervals, 99 ID variable, see time ID variable DATASOURCE procedure, 593 ID variable for time series data, 67 ID variables, 2623 ARIMA procedure, 263 DATE, 71 DATETIME, 71 EXPAND procedure, 777, 779
for interleaved time series, 80 for time series data, 67 FORECAST procedure, 839 PANEL procedure, 1321 SIMLIN procedure, 1665 sorting by, 72 STATESPACE procedure, 1734 TSCSREG procedure, 1927 X11 procedure, 2240, 2242 X12 procedure, 2311 ID variables for cross-sectional dimensions, 79 interleaved time series, 80 time series cross-sectional form, 79 IGARCH model AUTOREG procedure, 342 IMF balance of payment statistics DATASOURCE procedure, 653 IMF data files, see DATASOURCE procedure IMF direction of trade statistics DATASOURCE procedure, 653 IMF Economic Information System data files DATASOURCE procedure, 652 IMF government finance statistics DATASOURCE procedure, 654 IMF International Financial Statistics DATASOURCE procedure, 571 IMF international financial statistics DATASOURCE procedure, 652 IML, see SAS/IML software IML Studio software, 55 impact multipliers SIMLIN procedure, 1667, 1672 impulse function intervention model and, 220 impulse response function VARMAX procedure, 2090, 2111 impulse response matrix of a state space model, 1748 in SAS data sets time series, 2608 in standard form output data sets, 83 incrementing by intervals date values, 98 incrementing dates INTNX function, 98 incrementing dates by time intervals, 97, 98 Independence BDS test for, 351, 396 Rank Version of von Neumann Ratio test for, 397
Subject Index F 3091
Rank version of von Neumann ratio test for, 360 Runs test for, 355, 396 Turning Point test for, 359, 396 independent variables, see predictor variables indexing OUT= data set, 593 indexing the OUT= data set DATASOURCE procedure, 582, 625 inequality restriction linear models, 692 nonlinear models, 1024, 1049, 1126 infinite memory forecasts ARIMA procedure, 261 infinite order AR representation VARMAX procedure, 2090 infinite order MA representation VARMAX procedure, 2090, 2111 informats date values, 69, 140 datetime values, 69, 140 time values, 140 informats for date values, 69 datetime values, 69 initial values, 361, 941 GARCH model, 361 initializations smoothing models, 2898 initializing lags MODEL procedure, 1212 SIMLIN procedure, 1670 innovation vector of a state space model, 1717 input block COMPUTAB procedure, 489 input data set, 2619, 2801 input data sets ENTROPY procedure, 707 MODEL procedure, 1154 input file DATASOURCE procedure, 582, 583 input matrix of a state space model, 1717 input series ARIMA procedure, 216 INPUT variables X12 procedure, 2313 inputs, see predictor variables installment loans, see LOAN procedure instrumental regression, 1059 instrumental variables, 1059 choice of, 1134 for nonlinear models, 1134
number to use, 1135 SYSLIN procedure, 1764 instruments, 1058 INTCK function calendar calculations and, 103 counting time intervals, 101 defined, 98 INTCK function and time intervals, 98, 101 interaction effects ARIMA procedure, 221 interest rates LOAN procedure, 894 interim multipliers SIMLIN procedure, 1663, 1668, 1671, 1672 interleaved data set, 2636 interleaved form output data sets, 82 interleaved form of time series data set, 80 interleaved time series and _TYPE_ variable, 80, 81 combined with cross-sectional dimension, 81 defined, 80 FORECAST procedure and, 80, 81 ID variables for, 80 plots of, 89 Internal Rate of Return, 3050 internal rate of return LOAN procedure, 896 internal variables MODEL procedure, 1203 international financial statistics data files, see DATASOURCE procedure International Monetary Fund data files, see DATASOURCE procedure DATASOURCE procedure, 652 interpolation between levels and rates, 123 between stocks and flows, 123 EXPAND procedure and, 121 of missing values, 122, 767 time series data, 123 to higher frequency, 122 to lower frequency, 122 interpolation methods EXPAND procedure, 783 interpolation of missing values, 122 time series data, 121, 122, 767 interpolation of missing values EXPAND procedure, 122 interpolation of time series
3092 F Subject Index
step function, 785 interrupted time series analysis, see intervention model interrupted time series model, see intervention model interval functions, see time intervals, functions interval functions and calendar calculations, 103 INTERVAL= option and time intervals, 84 intervals, see time intervals, 2623 intervention analysis, see intervention model intervention model ARIMA procedure, 216, 219, 222, 301 interrupted time series analysis, 220 interrupted time series model, 216 intervention analysis, 220 intervention model and impulse function, 220 step function, 220 intervention notation, 2914 intervention specification, 2823, 2825 interventions, 2913 automatic inclusion of, 2810 forecasting models, 2755 point, 2913 predictor variables, 2913 ramp, 2914 specifying, 2755 step, 2913 INTNX function calendar calculations and, 103 checking data periodicity, 102 computing ceiling of intervals, 101 computing ending date of intervals, 100 computing midpoint date of intervals, 100 computing widths of intervals, 100 defined, 97 incrementing dates, 98 normalizing dates in intervals, 100 INTNX function and date values, 98 time intervals, 97 introduced DIF function, 104 LAG function, 104 percent change calculations, 109 time variables, 94 inverse autocorrelation function ARIMA procedure, 243 invertibility ARIMA procedure, 259 VARMAX procedure, 2141 Investment Analysis System, 47
Investment Portfolio, 2988 invoking the system, 2612 IRoR, 3050 irregular component X11 procedure, 2228, 2234 iterated generalized method of moments, 1065 iterated seemingly unrelated regression SYSLIN procedure, 1797 iterated three-stage least squares SYSLIN procedure, 1797 Iterative Outlier Detection ARIMA procedure, 310 Jacobi method MODEL procedure, 1190 Jacobi method with General Form Equations MODEL procedure, 1191 Jacobian, 1058, 1077 Jarque-Bera test, 354 normality tests, 354 JMP, 57 JOIN method EXPAND procedure, 784 joint generalized least squares, see seemingly unrelated regression jointly dependent variables SYSLIN procedure, 1764 K-class estimation SYSLIN procedure, 1796 Kalman filter AUTOREG procedure, 373 STATESPACE procedure, 1718 used for state space modeling, 1718 KEEP in the DATA step SASEFAME engine, 2517 kernels, 1062, 1697 SPECTRA procedure, 1697 Kolmogorov-Smirnov test, 1098 normality tests, 1098 KPSS (Kwiatkowski, Phillips, Schmidt, Shin) test, 357 KPSS test, 357, 393 unit roots, 393 Kruskal-Wallis test, 2270 labeling variables DATASOURCE procedure, 590 LAG function alternatives to, 107 explained, 105 introduced, 104 MODEL procedure version, 107 multiperiod lags and, 108 percent change calculations and, 109, 110
Subject Index F 3093
pitfalls of, 106 LAG function and Lags, 105 lags, 104, 106 lag functions functions, 1209 MODEL procedure, 1209 lag lengths MODEL procedure, 1211 lag logic MODEL procedure, 1210 lagged dependent variables and tests for autocorrelation, 331 AUTOREG procedure, 331 lagged endogenous variables SYSLIN procedure, 1764 lagging time series data, 104–110 Lagrange multiplier test heteroscedasticity, 364 heteroscedasticity tests, 364 linear hypotheses, 694 nonlinear hypotheses, 962, 1056, 1128, 1458 Lags LAG function and, 105 lags LAG function and, 104, 106 MODEL procedure and, 107 multiperiod lagging, 108 percent change calculations and, 109, 110 RETAIN statement and, 107 SIMLIN procedure, 1670 lambda, 1078 language differences MODEL procedure, 1213 large problems MODEL procedure, 1095 leads calculation of, 111 multiperiod, 111 time series data, 111 Lee and King’s test, 403 Lee and King’s test for Heteroscedasticity, 403 left-hand side expressions nonlinear models, 1201 lengths of variables DATASOURCE procedure, 578, 590 level shifts forecasting models, 2760 specifying, 2760 levels contrasted with flows or rates, 768 levels, of classification variable, 531
LIBNAME libref SASEHAVR ‘physical name’ on Windows SASEFAME engine, 2512 LIBNAME libref SASEHAVR ‘physical name’on UNIX SASEFAME engine, 2512 LIBNAME interface engine for Fame database, see SASEFAME engine LIBNAME interface engine for Haver database, see SASEHAVR engine LIBNAME statement SASECRSP engine, 2398 SASEFAME engine, 2500 SASEHAVR engine, 2556 likelihood confidence intervals, 1132 MODEL procedure, 1132 Likelihood ratio test nonlinear hypotheses, 962 likelihood ratio test linear hypotheses, 694 nonlinear hypotheses, 1056, 1128 limitations on ordinary differential equations (ODEs), 1197 limitations on ordinary differential equations MODEL procedure, 1197 Limited Dependent Variable Models QLIM procedure, 1446 limited information maximum likelihood LIML estimation method, 1762 SYSLIN procedure, 1796 LIML estimation method, see limited information maximum likelihood linear trend curves, 2912 linear dependencies MODEL procedure, 1091 linear exponential smoothing, 2902 Holt smoothing model, 2902 smoothing models, 2902 linear hypotheses Lagrange multiplier test, 694 likelihood ratio test, 694 Wald test, 694 linear hypothesis testing, 1360 PANEL procedure, 1360 linear models equality restriction, 692 inequality restriction, 692 restricted estimation, 692 linear structural equations SIMLIN procedure, 1667 linear trend, 2742, 2912 forecasting models, 2742 linearized form
3094 F Subject Index
Durbin-Watson tests, 361 link function heteroscedasticity models, 363 Listing the Haver selection keys, OUTSELECT=ON SASEHAVR engine, 2557 Loan, 2993 LOAN procedure adjustable rate mortgage, 871, 872 amortization schedule, 900 balloon payment mortgage, 871, 872 break even analysis, 896 buydown rate loans, 871, 872 comparing loans, 879, 896, 900 continuous compounding, 894 fixed rate mortgage, 871, 872 installment loans, 871 interest rates, 894 internal rate of return, 896 loan repayment schedule, 900 loan summary table, 900 loans analysis, 871 minimum attractive rate of return, 896 mortgage loans, 871 output data sets, 897, 898 output table names, 900 present worth of cost, 896 rate adjustment cases, 890 taxes, 896 true interest rate, 896 types of loans, 872 loan repayment schedule LOAN procedure, 900 loan summary table LOAN procedure, 900 loans analysis, see LOAN procedure log transformations, 2895 log likelihood value, 354 log test, 2916 log transformation, see transformations log transformations ARIMA procedure, 262 LOGTEST macro, 160 logarithmic trend curves, 2913 logarithmic trend, 2913 logistic transformations, 2895 trend curves, 2912 logistic trend, 2912 logit QLIM Procedure, 1422 LOGTEST macro
log transformations, 160 output data sets, 161 SAS macros, 160 long-run relations testing VARMAX procedure, 2163 %MA and %AR macros combined, 1149 MA Initial Conditions conditional least squares, 1142 maximum likelihood, 1142 unconditional least squares, 1142 macros, see SAS macros MAE AUTOREG procedure, 382 main economic indicators (OECD) data files, see DATASOURCE procedure main economic indicators (OECD) data files in FAME.db, see SASEFAME engine managing forecasting project, 2827 managing forecasting projects, 2827 MAPE AUTOREG procedure, 382 Mardia’s test, 1098 normality tests, 1098 Marquardt method ARIMA procedure, 253 Marquardt-Levenberg method, 1078 MARR, see minimum attractive rate of return, 3071 mathematical functions, 51 functions, 1208 matrix language SAS/IML software, 54 maximizing likelihood functions, 56 maximum likelihood AR initial conditions, 1141 MA Initial Conditions, 1142 maximum likelihood method AUTOREG procedure, 375 MDC procedure binary data modeling example, 965 binary logit example, 965, 968 binary probit example, 965 bounds on parameter estimates, 935 BY groups, 936 conditional logit example, 968 conditional logit model, 914, 915, 950 goodness-of-fit measures, 961 Hausman’s specification and likelihood ratio tests for nested logit, 965 heteroscedastic extreme value model, 925, 952 ID groups, 936
Subject Index F 3095
introductory examples, 915 mixed logit model, 930, 953 multinomial discrete choice, 949 multinomial probit example, 971 multinomial probit model, 924, 955 nested logit example, 978 nested logit model, 920, 956 output table names, 964 restrictions on parameter estimates, 946 syntax, 932 Tests on Parameters, 962 mean absolute error statistics of fit, 2917 mean absolute percent error statistics of fit, 2012, 2917 mean percent error statistics of fit, 2918 mean prediction error statistics of fit, 2918 mean square error statistics of fit, 2012 mean squared error statistics of fit, 2917 MEANS procedure, 49 measurement equation observation equation, 1717 of a state space model, 1717 MELO estimation method, see minimum expected loss estimator memory requirements MODEL procedure, 1096 VARMAX procedure, 2193 menu interfaces to SAS/ETS software, 46, 47 merging series time series data, 117 merging time series data sets, 117 Michaelis-Menten Equations, 1124 midpoint dates of time intervals, 100 MINIC (Minimum Information Criterion) method, 246 minimization methods MODEL procedure, 1077 minimization summary MODEL procedure, 1080 minimum attractive rate of return LOAN procedure, 896 MARR, 896 minimum expected loss estimator MELO estimation method, 1796 SYSLIN procedure, 1796 minimum information criteria method VARMAX procedure, 2132
Minimum Information Criterion (MINIC) method, 246 missing observations contrasted with omitted observations, 78 missing values, 792, 1156 COMPUTAB procedure, 491 contrasted with omitted observations, 78 embedded in time series, 78 ENTROPY procedure, 706 FORECAST procedure, 839 interpolation of, 122 MODEL procedure, 1075, 1192 smoothing models, 2898 time series data, 767 time series data and, 77 VARMAX procedure, 2104 missing values and time series data, 77, 78 MISSONLY operator, 793 mixed logit model MDC procedure, 930, 953 Mixture of Distributions example, 1273 MMAE, 2896 MMSE, 2896 model evaluation, 2890 Model Identification ARIMA procedure, 303 model list, 2653, 2834 MODEL procedure adjacency graph, 1227 adjusted R squared, 1077 Almon lag polynomials, 1152 analyzing models, 1222 ARMA model, 1138 autoregressive models, 1138 auxiliary equations, 1124 block structure, 1227 character variables, 1204 Chow tests, 1131 collinearity diagnostics, 1082, 1091 compiler listing, 1220 control variables, 1202 controlling starting values, 1084 convergence criteria, 1078 cross-equation covariance matrix, 1076 cross-reference, 1219 dependency list, 1223 derivatives, 1207 diagnostics and debugging, 1217 Durbin-Watson, 1075 dynamic simulation, 1118, 1167 Empirical Distribution Estimation, 1073 equation translations, 1204
3096 F Subject Index
equation variables, 1201 estimation convergence problems, 1088 estimation methods, 1057 estimation of ordinary differential equations, 1120 forecasting, 1169 full information maximum likelihood, 1069 functions across time, 1209 Gaussian distribution, 1030 goal seeking, 1187 grid search, 1086 Hausman specification test, 1129 initializing lags, 1212 input data sets, 1154 internal variables, 1203 Jacobi method, 1190 Jacobi method with General Form Equations, 1191 lag functions, 1209 lag lengths, 1211 lag logic, 1210 language differences, 1213 large problems, 1095 likelihood confidence intervals, 1132 limitations on ordinary differential equations, 1197 linear dependencies, 1091 memory requirements, 1096 minimization methods, 1077 minimization summary, 1080 missing values, 1075, 1192 model variables, 1201 Monte Carlo simulation, 1266 Moore-Penrose generalized inverse, 1035 moving average models, 1138 Multivariate t-Distribution Estimation, 1072 n-period-ahead forecasting, 1167 nested iterations, 1077 Newton’s Method, 1190 nonadditive errors, 1109 normal distribution, 1030 ODS graph names, 1165 ordinary differential equations and goal seeking, 1125 output data sets, 1160 output table names, 1163 parameters, 1202 polynomial distributed lag models, 1152 program listing, 1218 program variables, 1204 properties of the estimates, 1075 quasi-random number generators, 1179 R squared, 1077, 1084 random-number generating functions, 1208
restrictions on parameters, 1148 S matrix, 1076 S-iterated methods, 1077 Seidel method, 1191 Seidel method with General Form Equations, 1191 SIMNLIN procedure, 995 simulated nonlinear least squares, 1069 simulation, 1169 solution mode output, 1181 solution modes, 1166, 1189 SOLVE Data Sets, 1198 starting values, 1081, 1088 static simulation, 1118 static simulations, 1167 stochastic simulation, 1170 storing programs, 1216 summary statistics, 1184 SYSNLIN procedure, 995 systems of ordinary differential equations, 1263 tests on parameters, 1128 time variable, 1124 troubleshooting estimation convergence problems, 1080 troubleshooting simulation problems, 1192 using models to forecast, 1169 using solution modes, 1166 variables in model program, 1200 _WEIGHT_ variable, 1102 MODEL procedure and differencing, 107 lags, 107 MODEL procedure version DIF function, 107 LAG function, 107 model selection, 2849 model selection criterion, 2730, 2838 model selection for X-11-ARIMA method X11 procedure, 2262 model selection list, 2839 model variables MODEL procedure, 1201 Model Viewer, 2655, 2843 graphs, 2647 plots, 2647 saving graphs and tables, 2857, 2859 Monte Carlo simulation, 1170, 1266 examples, 1266 MODEL procedure, 1266 Moore-Penrose generalized inverse, 1035 mortgage loans, see LOAN procedure moving average function, 1209 moving average models, 1139
Subject Index F 3097
MODEL procedure, 1138 moving averages percent change calculations, 110 moving between computer systems SAS data sets, 49 moving product and geometric mean operators, 797 moving rank operator, 796 moving seasonality test, 2270 moving t-value operators, 800 moving time window operators, 789 moving-average parameters ARIMA procedure, 259 multinomial discrete choice independence from irrelevant alternatives, 951 MDC procedure, 949 multinomial probit model MDC procedure, 924, 955 multiperiod leads, 111 multiperiod differences differencing, 108 multiperiod lagging lags, 108 multiperiod lags and DIF function, 108 LAG function, 108 summation, 112, 113 multiple selections, 2626 multiplicative model ARIMA model, 216 multiplicative seasonal smoothing, 2905 smoothing models, 2905 multipliers SIMLIN procedure, 1663, 1664, 1667, 1668, 1671, 1672 multipliers for higher order lags SIMLIN procedure, 1668, 1682 multivariate autocorrelations, 1721 normality tests, 1098 partial autocorrelations, 1740 multivariate forecasting STATESPACE procedure, 1716 multivariate GARCH Modeling VARMAX procedure, 2099 Multivariate Mixture of Distributions example, 1273 multivariate model diagnostic checks VARMAX procedure, 2148 Multivariate t-Distribution Estimation MODEL procedure, 1072 multivariate time series
STATESPACE procedure, 1716 n-period-ahead forecasting MODEL procedure, 1167 naming time intervals, 84, 128 naming model parameters ARIMA procedure, 259 national accounts data files (OECD), see DATASOURCE procedure national accounts data files (OECD) in FAME.db, see SASEFAME engine national income and product accounts, see DATASOURCE procedure DATASOURCE procedure, 634 negative log likelihood function, 1070 negative log-likelihood function, 1072 Nerlove variance components, 1342 nested iterations MODEL procedure, 1077 nested logit model MDC procedure, 920, 956 Newton’s Method MODEL procedure, 1190 Newton-Raphson optimization methods, 524, 941 Newton-Raphson method, 524, 941 NIPA Tables DATASOURCE procedure, 634 NLO Overview NLO system, 169 NLO system NLO Overview, 169 Options, 169 output table names, 187 remote monitoring, 185 nominal variables, see also classification variables NOMISS operator, 793 nonadditive errors MODEL procedure, 1109 nonlinear hypotheses Lagrange multiplier test, 962, 1056, 1128, 1458 Likelihood ratio test, 962 likelihood ratio test, 1056, 1128 Wald test, 962, 1056, 1128, 1458 nonlinear least-squares AUTOREG procedure, 375 nonlinear models equality restriction, 1049, 1126 functions of parameters, 1031 inequality restriction, 1024, 1049, 1126 left-hand side expressions, 1201
3098 F Subject Index
restricted estimation, 1024, 1049, 1126 test of hypotheses, 1055 nonmissing observations statistics of fit, 2916 nonseasonal ARIMA model notation, 2908 nonseasonal transfer function notation, 2910 nonstationarity, see stationarity normal distribution MODEL procedure, 1030 normality tests, 1098 Henze-Zirkler test, 1098 Jarque-Bera test, 354 Kolmogorov-Smirnov test, 1098 Mardia’s test, 1098 multivariate, 1098 Shapiro-Wilk test, 1098 normalizing dates in intervals INTNX function, 100 normalizing to intervals date values, 100 notation nonseasonal ARIMA model, 2908 nonseasonal transfer function, 2910 seasonal ARIMA model, 2909 seasonal transfer function, 2911 notation for ARIMA model, 210 ARMA model, 210 number of observations statistics of fit, 2916 number to use instrumental variables, 1135 numerator factors transfer function model, 221 OBJECT convergence measure, 1078 objective function, 1057 observation equation, see measurement equation observation numbers, 2873 as time ID, 2671 time ID variable, 2671 obtaining descriptive information DATASOURCE procedure, 569, 573–575, 594–597 ODS graph names ARIMA procedure, 279 AUTOREG procedure, 415 ENTROPY procedure, 710 ESM procedure, 749 EXPAND procedure, 803 MODEL procedure, 1165 SEVERITY procedure, 1561
SIMILARITY procedure, 1631 SYSLIN procedure, 1808 TIMESERIES procedure, 1899 UCM procedure, 2006 VARMAX procedure, 2191 ODS Graphics ARIMA procedure, 228 UCM procedure, 1946 X12 procedure, 2308 OECD ANA data files DATASOURCE procedure, 654 OECD annual national accounts DATASOURCE procedure, 654 OECD data files, see DATASOURCE procedure OECD data files in FAME.db, see SASEFAME engine OECD main economic indicators DATASOURCE procedure, 656 OECD MEI data files DATASOURCE procedure, 656 OECD QNA data files DATASOURCE procedure, 655 OECD quarterly national accounts DATASOURCE procedure, 655 of a state space model impulse response matrix, 1748 innovation vector, 1717 input matrix, 1717 measurement equation, 1717 state transition equation, 1717 state vector, 1716 transition equation, 1717 transition matrix, 1717 of a time series unit root, 158 of interleaved time series overlay plots, 89 of missing values interpolation, 122, 767 of time series distribution, 768 overlay plots, 88 sampling frequency, 71, 84, 122 simulation, 2788, 2882 stationarity, 213 summation, 112 time ranges, 77 of time series data set standard form, 76 time series cross-sectional form, 79 of time series observations frequency, 84, 122 periodicity, 71, 84, 122 omitted observations
Subject Index F 3099
contrasted with missing values, 78 defined, 78 replacing with missing values, 102 omitted observations in time series data, 78 one-way fixed effects model, 1332 random effects model, 1339 one-way fixed effects model PANEL procedure, 1332 one-way fixed-effects model, 1332 one-way random effects model PANEL procedure, 1339 one-way random-effects model, 1339 operations research SAS/OR software, 55 optimization methods Newton-Raphson, 524, 941 quasi-Newton, 362, 524, 941 trust region, 362, 524, 941 optimizations smoothing weights, 2899 Options NLO system, 169 options automatic model selection, 2796 order of calculations COMPUTAB procedure, 485 order statistics, see RANK procedure Ordinal Discrete Choice Modeling QLIM procedure, 1443 ordinary differential equations (ODEs) and goal seeking, 1125 differential algebraic equations, 1197 example, 1263 explosive differential equations, 1197 limitations on, 1197 systems of, 1263 ordinary differential equations and goal seeking MODEL procedure, 1125 Organization for Economic Cooperation and Development data files, see DATASOURCE procedure DATASOURCE procedure, 654 Organization for Economic Cooperation and Development data files in FAME.db, see SASEFAME engine orthogonal polynomials PDLREG procedure, 1396 OUT= data set indexing, 593 OUTALL= data set DATASOURCE procedure, 574 OUTBY= data set
DATASOURCE procedure, 573 OUTCONT= data set DATASOURCE procedure, 569, 575 Outlier Detection ARIMA procedure, 308 Output Data Sets VARMAX procedure, 2178 output data sets and the OUTPUT statement, 83 ARIMA procedure, 265, 267, 270, 272 AUTOREG procedure, 410 BOXCOXAR macro, 155 COMPUTAB procedure, 491 DATASOURCE procedure, 567, 592, 594–597 DFTEST macro, 159 different forms of, 82 ENTROPY procedure, 708 EXPAND procedure, 801 FORECAST procedure, 850, 852 in standard form, 83 interleaved form, 82 LOAN procedure, 897, 898 LOGTEST macro, 161 MODEL procedure, 1160 PANEL procedure, 1368, 1370 PDLREG procedure, 1409 produced by SAS/ETS procedures, 82 SIMLIN procedure, 1670, 1671 SPECTRA procedure, 1700 STATESPACE procedure, 1749, 1750 SYSLIN procedure, 1803, 1804 X11 procedure, 2265, 2266 Output Delivery System (ODS), 2846, 2886 OUTPUT statement SAS/ETS procedures using, 83 output table names ARIMA procedure, 275 AUTOREG procedure, 413 COUNTREG procedure, 547 ENTROPY procedure, 709 LOAN procedure, 900 MDC procedure, 964 MODEL procedure, 1163 NLO system, 187 PANEL procedure, 1371 PDLREG procedure, 1410 QLIM procedure, 1465 SIMLIN procedure, 1673 SPECTRA procedure, 1702 STATESPACE procedure, 1752 SYSLIN procedure, 1807 TSCSREG procedure, 1930 X11 procedure, 2279
3100 F Subject Index
over identification restrictions SYSLIN procedure, 1802 overlay plot of time series data, 88 overlay plots of interleaved time series, 89 of time series, 88 _TYPE_ variable and, 90 p-values for Durbin-Watson test, 329 panel data TSCSREG procedure, 1919 Panel GMM, 1352 GMM in Panel: Arellano and Bond’s Estimator, 1352 PANEL procedure Between Estimators, 1339 BY groups, 1320 Da Silva method, 1350 generalized least squares, 1348 HCCME =, 1361 ID variables, 1321 linear hypothesis testing, 1360 one-way fixed effects model, 1332 one-way random effects model, 1339 output data sets, 1368, 1370 output table names, 1371 Parks method, 1348 Pooled Estimator, 1339 predicted values, 1328 printed output, 1370 R-square measure, 1364 residuals, 1328 specification tests, 1364 two-way fixed effects model, 1333 two-way random effects model, 1342 Zellner’s two-stage method, 1349 parameter change vector, 1092 parameter estimates, 2661 parameter estimation, 2890 parameters MODEL procedure, 1202 UCM procedure, 1949–1960, 1962–1972 Pareto charts, 56 Parks method PANEL procedure, 1348 partial autocorrelations multivariate, 1740 partial autoregression coefficient VARMAX procedure, 2091, 2128 partial canonical correlation VARMAX procedure, 2091, 2131 partial correlation
VARMAX procedure, 2129 PDL, see polynomial distributed lags PDLREG procedure BY groups, 1402 confidence limits, 1406 distributed lag regression models, 1395 orthogonal polynomials, 1396 output data sets, 1409 output table names, 1410 polynomial distributed lags, 1396 predicted values, 1405 residuals, 1405 restricted estimation, 1406 percent change calculations at annual rates, 109 introduced, 109 moving averages, 110 period-to-period, 109 time series data, 109, 110 year-over-year, 109 yearly averages, 110 percent change calculations and DIF function, 109, 110 differencing, 109, 110 LAG function, 109, 110 lags, 109, 110 percent operators, 800 period of evaluation, 2734 period of fit, 2734, 2803, 2878 period-to-period percent change calculations, 109 Periodic Equivalent, see Uniform Periodic Equivalent periodicity changing by interpolation, 122, 765 of time series observations, 71, 84, 122 periodicity of time series data, 84, 122 periodicity of time series time intervals, 84, 122 periodogram SPECTRA procedure, 1690, 1701 PGARCH model, 379 AUTOREG procedure, 342 Power GARCH model, 379 Phillips-Ouliaris test, 355, 390 Phillips-Perron test, 355, 389 unit roots, 355–357, 389 Phillips-Perron tests, 234 Physical Names on Supported hosts SASEFAME engine, 2512 Physical path name syntax for variety of environments SASEFAME engine, 2512
Subject Index F 3101
pitfalls of DIF function, 106 LAG function, 106 plot axis and time intervals, 87 plot axis for time series SGPLOT procedure, 87 PLOT procedure, 50 plotting time series, 91 time series data, 91 plot reference lines and time intervals, 87 plots, see Model Viewer, see Time Series Viewer plots of interleaved time series, 89 plotting autocorrelations, 197 forecasts, 2664 prediction errors, 2657 residual, 91 time series data, 86 plotting time series PLOT procedure, 91 SGPLOT procedure, 86 Time Series Viewer procedure, 86 point interventions, 2913 point interventions, 2913 point-in-time values, 765, 768 polynomial distributed lag models MODEL procedure, 1152 polynomial distributed lags Almon lag polynomials, 1395 endpoint restrictions for, 1396, 1402 PDL, 1395 PDLREG procedure, 1396 Polynomial specification, 2783, 2812, 2851 pooled pooled estimator, 1339 Pooled Estimator PANEL procedure, 1339 pooled estimator, 1339 pooled, 1339 Portfolio, see Investment Portfolio Portmanteau Q test, 402 Portmanteau Q test for Heteroscedasticity, 402 power curve trend curves, 2913 power curve trend, 2913 Power GARCH model, see PGARCH model PPC convergence measure, 1078 Prais-Winsten estimates AUTOREG procedure, 375
PRED. variables, 1204 predetermined variables SYSLIN procedure, 1764 predicted values ARIMA procedure, 260 AUTOREG procedure, 369, 405, 406 conditional variance, 407 FORECAST procedure, 851 PANEL procedure, 1328 PDLREG procedure, 1405 SIMLIN procedure, 1661, 1666 STATESPACE procedure, 1745, 1749 structural, 369, 405, 1405 SYSLIN procedure, 1788 transformed models, 1111 predicting conditional variance, 407 prediction error covariance VARMAX procedure, 2089, 2122, 2124 prediction errors autocorrelations, 2658 plotting, 2657 residuals, 2726 stationarity, 2659 predictions smoothing models, 2898 predictive Chow test, 354, 405 predictive Chow tests, 1131 predictor variables forecasting models, 2739 independent variables, 2739 inputs, 2739 interventions, 2913 seasonal dummies, 2915 specifying, 2739 trend curves, 2912 Present Value Analysis, see Time Value Analysis present worth of cost LOAN procedure, 896 prewhitening ARIMA procedure, 250, 251 principal component, 1091 PRINT procedure, 50 printing SAS data sets, 50 printed output ARIMA procedure, 273 AUTOREG procedure, 412 PANEL procedure, 1370 SIMLIN procedure, 1671 STATESPACE procedure, 1751 SYSLIN procedure, 1805 X11 procedure, 2268 printing SAS data sets, 50
3102 F Subject Index
printing SAS data sets, see PRINT procedure probability functions, 51 PROBDF Function Dickey-Fuller test, 162 Financial Functions, 162 significance probabilities, 162 significance probabilities for Dickey-Fuller tests, 162 PROBDF function defined, 162 probit QLIM Procedure, 1422 produced by SAS/ETS procedures output data sets, 82 Producer Price Index Survey, see DATASOURCE procedure producing forecasts, 2632, 2852 producing forecasts, 2852 program flow COMPUTAB procedure, 482 program listing MODEL procedure, 1218 program variables MODEL procedure, 1204 programming statements COMPUTAB procedure, 479 Project Management window forecasting project, 2639 properties of the estimates MODEL procedure, 1075 properties of time series, 2681 PROTO procedure, 50 printing SAS data sets, 50 QGARCH model, 379 AUTOREG procedure, 342 Quadratic GARCH model, 379 QLIM Procedure, 1422 logit, 1422 probit, 1422 selection, 1422 tobit, 1422 QLIM procedure Bivariate Limited Dependent Variable Modeling, 1454 Box-Cox Modeling, 1453 BY groups, 1433 Censored Regression Models, 1446 Frontier, 1450 Heteroscedasticity, 1452 Limited Dependent Variable Models, 1446 Multivariate Limited Dependent Models, 1457
Ordinal Discrete Choice Modeling, 1443 Output, 1459 output table names, 1465 Selection Models, 1455 syntax, 1428 Tests on Parameters, 1458 Truncated Regression Models, 1449 Types of Tobit Model, 1447 quadratic trend curves, 2912 Quadratic GARCH model, see QGARCH model quadratic trend, 2912 quadrature spectrum cross-spectral analysis, 1701 SPECTRA procedure, 1701 qualitative variables, see classification variables quasi-Newton optimization methods, 362, 524, 941 quasi-Newton method, 362, 524, 941 AUTOREG procedure, 350 quasi-random number generators MODEL procedure, 1179 R convergence measure, 1078 R square statistic statistics of fit, 2012 R squared MODEL procedure, 1077, 1084 R-square measure PANEL procedure, 1364 R-square statistic statistics of fit, 2917 SYSLIN procedure, 1799 R-squared measure, 1364 ramp interventions, 2914 ramp function, see ramp interventions ramp interventions, 2914 ramp function, 2913 Ramsey’s test, see RESET test random effects model one-way, 1339 two-way, 1342 random number functions, 51 random walk model AUTOREG procedure, 425 random walk R-square statistics of fit, 2012, 2917 random-number functions functions, 1208 random-number generating functions MODEL procedure, 1208 random-walk with drift tests, 234 range of output observations
Subject Index F 3103
EXPAND procedure, 780 RANGE= option in the LIBNAME statement SASEFAME engine, 2520 RANK procedure, 50 order statistics, 50 Rank Version of von Neumann Ratio test, 397 Rank version of von Neumann ratio test, 360 Rank Version of von Neumann Ratio test for Independence, 397 Rank version of von Neumann ratio test for Independence, 360 rate adjustment cases LOAN procedure, 890 rates contrasted with stocks or levels, 768 ratio operators, 800 rational transfer functions ARIMA procedure, 222 reading time series data, 66, 125 reading data files DATASOURCE procedure, 567 reading from a Fame data base SASEFAME engine, 2501 reading from a Haver DLX database SASEHAVR engine, 2556 reading from CRSP data files SASECRSP engine, 2401 reading, with DATA step time series data, 123, 124 recommended for time series ID formats, 71 recursive residuals, 369, 382 reduced form coefficients SIMLIN procedure, 1667, 1672, 1676 SYSLIN procedure, 1801 reference forecasting models, 2736 SGPLOT procedure, 87 regression model with ARMA errors ARIMA procedure, 216, 218 regressor definition, 531 regressor selection, 2856 regressors forecasting models, 2747 specifying, 2747 relation to ARMA models state space models, 1747 Remote Fame Access, Using Fame CHLI SASEFAME engine, 2502 remote monitoring NLO system, 185 RENAME in the DATA step
SASEFAME engine, 2517 renaming SAS data sets, 49 renaming variables DATASOURCE procedure, 576, 591 replacing with missing values omitted observations, 102 represented by different series cross sectional dimensions, 79 represented with BY groups cross-sectional dimensions, 79 reserved words COMPUTAB procedure, 490 RESET test, 354 Ramsey’s test, 354 RESID. variables, 1104, 1109, 1204 residual plotting, 91 residual analysis, 2726 residuals, see prediction errors ARIMA procedure, 260 AUTOREG procedure, 369 FORECAST procedure, 851 PANEL procedure, 1328 PDLREG procedure, 1405 SIMLIN procedure, 1666 STATESPACE procedure, 1749 structural, 369, 1405 SYSLIN procedure, 1788 response variable, 531 restarting the SASEFAME engine SASEFAME engine, 2501 RESTRICT statement, 364, 692, 1049 restricted estimates STATESPACE procedure, 1735 restricted estimation, 364 linear models, 692 nonlinear models, 1024, 1049, 1126 PDLREG procedure, 1406 SYSLIN procedure, 1789, 1790 restricted vector autoregression, 1148 restrictions on parameters MODEL procedure, 1148 RETAIN statement computing lags, 107 RETAIN statement and differencing, 107 lags, 107 root mean square error statistics of fit, 2012, 2917 row blocks COMPUTAB procedure, 490 ROWxxxxx: label COMPUTAB procedure, 480
3104 F Subject Index
RPC convergence measure, 1078 Runs test, 355, 396 Runs test for Independence, 355, 396 S convergence measure, 1078 S matrix definition, 1058 MODEL procedure, 1076 S matrix used in estimation, 1076 S-iterated methods MODEL procedure, 1077 sample cross covariances VARMAX procedure, 2089, 2127 sample cross-correlations VARMAX procedure, 2089, 2127 sample data sets, 2608, 2621 sampling frequency changing by interpolation, 122 of time series, 71, 84, 122 time intervals and, 84 sampling frequency of time series data, 84, 122 sampling frequency of time series time intervals, 84, 122 SAS and CRSP Dates SASECRSP engine, 2416 SAS catalogs, see CATALOG procedure SAS data sets contents of, 49 copying, 49 DATA step, 49 moving between computer systems, 49 printing, 50 renaming, 49 sorting, 50 structured query language, 50 summarizing, 49, 50 transposing, 50 SAS data sets and time series data, 65 SAS DATA step SASECRSP engine, 2402 SASEFAME engine, 2501 SASEHAVR engine, 2557 SAS Date Format SASECRSP engine, 2416 SAS language features for time series data, 64 SAS macros BOXCOXAR macro, 154 DFPVALUE macro, 157 DFTEST macro, 158 LOGTEST macro, 160
macros, 153 SAS options statement, using VALIDVARNAME=ANY SASEFAME engine, 2512, 2517 SAS output data set SASECRSP engine, 2415 SASEFAME engine, 2508 SASEHAVR engine, 2563 SAS representation for date values, 68 datetime values, 69 SAS Risk Products, 60 SAS source statements, 2810 SAS YEARCUTOFF= option DATASOURCE procedure, 588 SAS/ETS procedures using OUTPUT statement, 83 SAS/GRAPH software, 52 graphics, 52 SAS/HPF, 52 SAS/IML software, 54 IML, 54 matrix language, 54 SAS/IML Studio software, 55 SAS/OR software, 55 operations research, 55 SAS/QC software, 56 statistical quality control, 56 SAS/STAT software, 53 SASECRSP engine @CRSPDB Date Informats, 2417 @CRSPDR Date Informats, 2417 @CRSPDT Date Informats, 2417 CONTENTS procedure, 2402 Converting Dates Using the CRSP Date Functions, 2416 CRSP and SAS Dates, 2416 CRSP Date Formats, 2416 CRSP Date Functions, 2416 CRSP Date Informats, 2417 CRSP Integer Date Format, 2416 CRSPDB_SASCAL environment variable, 2401 CRSPDCI Date Functions, 2418 CRSPDCS Date Functions, 2418 CRSPDI2S Date Function, 2418 CRSPDIC Date Functions, 2418 CRSPDS2I Date Function, 2418 CRSPDSC Date Functions, 2418 CRSPDT Date Formats, 2416 Environment variable, CRSPDB_SASCAL, 2401 LIBNAME statement, 2398 reading from CRSP data files, 2401
Subject Index F 3105
SAS and CRSP Dates, 2416 SAS DATA step, 2402 SAS Date Format, 2416 SAS output data set, 2415 SETID option, 2401 SQL procedure, creating a view, 2402 SASEFAME engine CONTENTS procedure, 2501 convert option, 2501 creating a Fame view, 2500 DOT as a GLUE character, 2507 DRI data files in FAME.db , 2500 DRI/McGraw-Hill data files in FAME.db, 2500 DROP in the DATA step, 2517 Fame data files, 2500 Fame glue symbol named DOT, 2512 Fame Information Services Databases, 2500 fatal error when reading from a Fame data base, 2501 finishing the Fame CHLI, 2501 GLUE symbol, 2507 KEEP in the DATA step, 2517 LIBNAME libref SASEHAVR ‘physical name’ on Windows, 2512 LIBNAME libref SASEHAVR ‘physical name’on UNIX, 2512 LIBNAME interface engine for Fame databases, 2500 LIBNAME statement, 2500 main economic indicators (OECD) data files in FAME.db, 2500 national accounts data files (OECD) in FAME.db, 2500 OECD data files in FAME.db, 2500 Organization for Economic Cooperation and Development data files in FAME.db, 2500 Physical Names on Supported hosts, 2512 Physical path name syntax for variety of environments, 2512 RANGE= option in the LIBNAME statement, 2520 reading from a Fame data base, 2501 Remote Fame Access, Using Fame CHLI, 2502 RENAME in the DATA step, 2517 restarting the SASEFAME engine, 2501 SAS DATA step, 2501 SAS options statement, using VALIDVARNAME=ANY, 2512, 2517 SAS output data set, 2508 Special characters in SAS Variable names, the glue symbol DOT, 2512
SQL procedure, using clause, 2501 SQL procedure,creating a view, 2501 Supported hosts, 2500 Using CROSSLIST= option to create a view, 2502 Using Fame expressions and Fame functions in an INSET, 2502 Using INSET= option with the CROSSLIST= option to create a view, 2502 Using INSET= option with the KEEPLIST= clause to create a view, 2502 Using KEEPLIST clause to create a view, 2502 Using RANGE= option to create a view, 2502 Using WHERE clause with INSET= option to create a view, 2502 Using WILDCARD= option to create a view, 2502 VALIDVARNAME=ANY, SAS option statement, 2512, 2517 viewing a Fame database, 2500 WHERE in the DATA step, 2520 SASEHAVR engine creating a Haver view, 2555 frequency option, 2556 Haver data files, 2555 Haver Information Services Databases, 2555 LIBNAME interface engine for Haver databases, 2555 LIBNAME statement, 2556 Listing the Haver selection keys, OUTSELECT=ON, 2557 reading from a Haver DLX database, 2556 SAS DATA step, 2557 SAS output data set, 2563 viewing a Haver database, 2555 SASHELP library, 2621 saving and restoring forecasting project, 2640 Savings, 3000 SBC, see Schwarz Bayesian criterion, see Schwarz Bayesian information criterion scale operators, 799 SCAN (Smallest Canonical) correlation method, 248 Schwarz Bayesian criterion ARIMA procedure, 254 AUTOREG procedure, 383 SBC, 254 Schwarz Bayesian information criterion BIC, 2917 SBC, 2917
3106 F Subject Index
statistics of fit, 2917 seasonal adjustment time series data, 2228, 2297 X11 procedure, 2228, 2234 X12 procedure, 2297 seasonal ARIMA model notation, 2909 Seasonal ARIMA model options, 2860 seasonal component X11 procedure, 2228 X12 procedure, 2297 seasonal dummies, 2915 predictor variables, 2915 seasonal dummy variables forecasting models, 2767 specifying, 2767 seasonal exponential smoothing, 2904 smoothing models, 2904 seasonal forecasting additive Winters method, 846 FORECAST procedure, 843, 846 WINTERS method, 843 seasonal model ARIMA model, 215 ARIMA procedure, 215 seasonal transfer function notation, 2911 seasonal unit root test, 250 seasonality FORECAST procedure, 848 testing for, 158 seasonality test, 2916 seasonality tests, 2270 seasonality, testing for DFTEST macro, 158 second difference DIF function, 108 differencing, 108 See ordinary differential equations differential equations, 1120 seemingly unrelated regression, 1060 cross-equation covariance matrix, 1060 joint generalized least squares, 1762 SUR estimation method, 1762 SYSLIN procedure, 1770, 1797 Zellner estimation, 1762 Seidel method MODEL procedure, 1191 Seidel method with General Form Equations MODEL procedure, 1191 selecting from a list forecasting models, 2685 selection QLIM Procedure, 1422
selection criterion, 2838 sequence operators, 797 serial correlation correction AUTOREG procedure, 320 series autocorrelations, 2723 series adjustments, 2895 series diagnostics, 2681, 2861, 2915 series selection, 2862 series transformations, 2724 set operators, 798 SETID option SASECRSP engine, 2401 SETMISS operator, 793 SEVERITY procedure BY groups, 1514 ODS graph names, 1561 SGMM simulated generalized method of moments, 1066 SGPLOT procedure plot axis for time series, 87 plotting time series, 86 reference, 87 time series data, 86 Shapiro-Wilk test, 1098 normality tests, 1098 sharing forecasting project, 2644 Shewhart control charts, 56 shifted time intervals, 129 shifted intervals, see time intervals, shifted significance probabilities Dickey-Fuller test, 162 PROBDF Function, 162 unit root, 162 significance probabilities for Dickey-Fuller test, 157 significance probabilities for Dickey-Fuller tests PROBDF Function, 162 SIMILARITY procedure BY groups, 1598 ODS graph names, 1631 SIMLIN procedure BY groups, 1664 dynamic models, 1660, 1661, 1667, 1682 dynamic multipliers, 1667, 1668 dynamic simulation, 1661 EST= data set, 1669 ID variables, 1665 impact multipliers, 1667, 1672 initializing lags, 1670 interim multipliers, 1663, 1668, 1671, 1672 lags, 1670
Subject Index F 3107
linear structural equations, 1667 multipliers, 1663, 1664, 1667, 1668, 1671, 1672 multipliers for higher order lags, 1668, 1682 output data sets, 1670, 1671 output table names, 1673 predicted values, 1661, 1666 printed output, 1671 reduced form coefficients, 1667, 1672, 1676 residuals, 1666 simulation, 1661 statistics of fit, 1672 structural equations, 1667 structural form, 1667 total multipliers, 1664, 1668, 1671, 1672 TYPE=EST data set, 1667 SIMNLIN procedure, see MODEL procedure simple data set, 2635 simple exponential smoothing, 2900 smoothing models, 2900 simulated method of moments GMM, 1066 simulated nonlinear least squares MODEL procedure, 1069 simulating ARIMA model, 2788, 2882 Simulating from a Mixture of Distributions examples, 1273 simulation MODEL procedure, 1169 of time series, 2788, 2882 SIMLIN procedure, 1661 time series, 2788, 2882 simultaneous equation bias, 1059 SYSLIN procedure, 1763 single equation estimators SYSLIN procedure, 1796 single exponential smoothing, see exponential smoothing sliding spans analysis, 2254 Smallest Canonical (SCAN) correlation method, 248 SMM, 1066 GMM, 1066 SMM simulated method of moments, 1066 smoothing equations, 2897 smoothing models, 2897 smoothing model specification, 2868, 2870 smoothing models calculations, 2897 damped-trend exponential smoothing, 2903 double exponential smoothing, 2901 exponential smoothing, 2897
forecasting models, 2690, 2897 initializations, 2898 linear exponential smoothing, 2902 missing values, 2898 multiplicative seasonal smoothing, 2905 predictions, 2898 seasonal exponential smoothing, 2904 simple exponential smoothing, 2900 smoothing equations, 2897 smoothing state, 2897 smoothing weights, 2899 specifying, 2690 standard errors, 2900 underlying model, 2897 Winters Method, 2906, 2907 smoothing state, 2897 smoothing models, 2897 smoothing weights, 2870, 2899 additive-invertible region, 2899 boundaries, 2899 FORECAST procedure, 847 optimizations, 2899 smoothing models, 2899 specifications, 2899 weights, 2899 solution mode output MODEL procedure, 1181 solution modes MODEL procedure, 1166, 1189 SOLVE Data Sets MODEL procedure, 1198 SORT procedure, 50 sorting, 50 sorting, see SORT procedure forecasting models, 2732, 2809 SAS data sets, 50 time series data, 72 sorting by ID variables, 72 Special characters in SAS Variable names, the glue symbol DOT SASEFAME engine, 2512 specification tests PANEL procedure, 1364 specifications smoothing weights, 2899 specifying adjustments, 2750 ARIMA models, 2693 combination models, 2710 custom models, 2700 dynamic regression, 2751 Factored ARIMA models, 2696 forecasting models, 2681
3108 F Subject Index
interventions, 2755 level shifts, 2760 predictor variables, 2739 regressors, 2747 seasonal dummy variables, 2767 smoothing models, 2690 state space models, 1726 time ID variable, 2877 trend changes, 2758 trend curves, 2743 SPECTRA procedure BY groups, 1694 Chirp-Z algorithm, 1696 coherency of cross-spectrum, 1701 cospectrum estimate, 1701 cross-periodogram, 1701 cross-spectral analysis, 1690, 1701 cross-spectrum, 1701 fast Fourier transform, 1696 finite Fourier transform, 1690 Fourier coefficients, 1701 Fourier transform, 1690 frequency, 1700 kernels, 1697 output data sets, 1700 output table names, 1702 periodogram, 1690, 1701 quadrature spectrum, 1701 spectral analysis, 1690 spectral density estimate, 1690, 1701 spectral window, 1695 white noise test, 1699, 1702 spectral analysis SPECTRA procedure, 1690 spectral density estimate SPECTRA procedure, 1690, 1701 spectral window SPECTRA procedure, 1695 SPLINE method EXPAND procedure, 783 splitting series time series data, 116 splitting time series data sets, 116 SQL procedure, 50 structured query language, 50 SQL procedure, creating a view SASECRSP engine, 2402 SQL procedure, using clause SASEFAME engine, 2501 SQL procedure,creating a view SASEFAME engine, 2501 square root transformations, 2895 square root transformation, see transformations
stable seasonality test, 2270 standard errors smoothing models, 2900 standard form of time series data set, 76 standard form of time series data, 76 STANDARD procedure, 50 standardized values, 50 standardized values, see STANDARD procedure starting dates of time intervals, 99, 100 starting values GARCH model, 350 MODEL procedure, 1081, 1088 state and area employment, hours, and earnings survey, see DATASOURCE procedure state space model UCM procedure, 1979 state space models form of, 1716 relation to ARMA models, 1747 specifying, 1726 state vector of, 1716 STATESPACE procedure, 1716 state transition equation of a state space model, 1717 state vector of a state space model, 1716 state vector of state space models, 1716 state-space representation VARMAX procedure, 2105 STATESPACE procedure automatic forecasting, 1716 BY groups, 1734 canonical correlation analysis, 1718, 1741 confidence limits, 1749 differencing, 1735 forecasting, 1716, 1745 ID variables, 1734 Kalman filter, 1718 multivariate forecasting, 1716 multivariate time series, 1716 output data sets, 1749, 1750 output table names, 1752 predicted values, 1745, 1749 printed output, 1751 residuals, 1749 restricted estimates, 1735 state space models, 1716 time intervals, 1733 Yule-Walker equations, 1738 static simulation, 1118
Subject Index F 3109
MODEL procedure, 1118 static simulations MODEL procedure, 1167 stationarity and state space models, 1719 ARIMA procedure, 198 nonstationarity, 198 of time series, 213 prediction errors, 2659 testing for, 158 VARMAX procedure, 2133, 2141 stationarity tests, 234, 250, 355 stationarity, testing for DFTEST macro, 158 statistical quality control SAS/QC software, 56 statistics of fit, 2011, 2653, 2662, 2872, 2916 adjusted R-square, 2012, 2917 Akaike’s information criterion, 2917 Amemiya’s prediction criterion, 2917 Amemiya’s R-square, 2012, 2917 corrected sum of squares, 2917 error sum of squares, 2917 goodness of fit, 2662 goodness-of-fit statistics, 2011, 2916 mean absolute error, 2917 mean absolute percent error, 2012, 2917 mean percent error, 2918 mean prediction error, 2918 mean square error, 2012 mean squared error, 2917 nonmissing observations, 2916 number of observations, 2916 R square statistic, 2012 R-square statistic, 2917 random walk R-square, 2012, 2917 root mean square error, 2012, 2917 Schwarz Bayesian information criterion, 2917 SIMLIN procedure, 1672 uncorrected sum of squares, 2917 step interventions, 2913 step function, see step interventions interpolation of time series, 785 intervention model and, 220 step interventions, 2913 step function, 2913 STEP method EXPAND procedure, 785 STEPAR method FORECAST procedure, 840 stepwise autoregression AUTOREG procedure, 332
FORECAST procedure, 818, 840 stochastic simulation MODEL procedure, 1170 stock data files, see DATASOURCE procedure stocks contrasted with flow variables, 768 stored in SAS data sets time series data, 75 storing programs MODEL procedure, 1216 structural predicted values, 369, 405, 1405 residuals, 369, 1405 structural change Chow test for, 352 structural equations SIMLIN procedure, 1667 structural form SIMLIN procedure, 1667 structural predictions AUTOREG procedure, 405 structured query language, see SQL procedure SAS data sets, 50 subset model ARIMA model, 215 ARIMA procedure, 215 AUTOREG procedure, 334 subsetting data, see WHERE statement subsetting data files DATASOURCE procedure, 567, 580 summarizing SAS data sets, 49, 50 summary of time intervals, 131 summary statistics MODEL procedure, 1184 summation higher order sums, 113 multiperiod lags and, 112, 113 of time series, 112 summation of time series data, 112, 113 Supported hosts SASEFAME engine, 2500 SUR estimation method, see seemingly unrelated regression Switching Regression example examples, 1269 syntax for date values, 68 datetime values, 69 time intervals, 84 time values, 69 SYSLIN procedure
3110 F Subject Index
Basmann test, 1787, 1802 BY groups, 1785 endogenous variables, 1764 exogenous variables, 1764 full information maximum likelihood, 1772, 1797 Fuller’s modification to LIML, 1802 instrumental variables, 1764 iterated seemingly unrelated regression, 1797 iterated three-stage least squares, 1797 jointly dependent variables, 1764 K-class estimation, 1796 lagged endogenous variables, 1764 limited information maximum likelihood, 1796 minimum expected loss estimator, 1796 ODS graph names, 1808 output data sets, 1803, 1804 output table names, 1807 over identification restrictions, 1802 predetermined variables, 1764 predicted values, 1788 printed output, 1805 R-square statistic, 1799 reduced form coefficients, 1801 residuals, 1788 restricted estimation, 1789, 1790 seemingly unrelated regression, 1770, 1797 simultaneous equation bias, 1763 single equation estimators, 1796 system weighted MSE, 1799 system weighted R-square, 1799, 1805 tests of hypothesis, 1791, 1792 three-stage least squares, 1770, 1797 two-stage least squares, 1767, 1796 SYSNLIN procedure, see MODEL procedure system weighted MSE SYSLIN procedure, 1799 system weighted R-square SYSLIN procedure, 1799, 1805 systems of ordinary differential equations (ODEs), 1263 systems of differential equations examples, 1263 systems of ordinary differential equations MODEL procedure, 1263 t distribution GARCH model, 380 table cells, direct access to COMPUTAB procedure, 490 table names UCM procedure, 2003 TABULATE procedure, 50
tabulating data, 50 tabulating data, see TABULATE procedure taxes LOAN procedure, 896 tentative order selection VARMAX procedure, 2127 test of hypotheses nonlinear models, 1055 TEST statement, 365 testing for heteroscedasticity, 334 seasonality, 158 stationarity, 158 unit root, 158 testing order of differencing, 158 testing overidentifying restrictions, 1065 tests of hypothesis SYSLIN procedure, 1791, 1792 tests of parameters, 365, 693, 1055 tests on parameters MODEL procedure, 1128 TGARCH model, 379 AUTOREG procedure, 342 Threshold GARCH model, 379 The D-method example, 1269 three-stage least squares, 1061 3SLS estimation method, 1762 SYSLIN procedure, 1770, 1797 Threshold GARCH model, see TGARCH model time functions, 94 time ID creation, 2873, 2875, 2876 time ID variable, 2617 creating, 2667 ID variable, 2617 observation numbers, 2671 specifying, 2877 time intervals, 2623 alignment of, 130 ARIMA procedure, 263 calendar calculations and, 103 ceiling of, 101 checking data periodicity, 102 counting, 98, 101 data frequency, 2612 date values, 128 datetime values, 128 ending dates of, 100 examples of, 134 EXPAND procedure, 779 EXPAND procedure and, 122 FORECAST procedure, 839 frequency of data, 2612
Subject Index F 3111
functions, 147 functions for, 97, 147 ID values for, 99 incrementing dates by, 97, 98 INTCK function and, 98, 101 INTERVAL= option and, 84 intervals, 84 INTNX function and, 97 midpoint dates of, 100 naming, 84, 128 periodicity of time series, 84, 122 plot axis and, 87 plot reference lines and, 87 sampling frequency of time series, 84, 122 shifted, 129 starting dates of, 99, 100 STATESPACE procedure, 1733 summary of, 131 syntax for, 84 UCM procedure, 1959 use with SAS/ETS procedures, 85 VARMAX procedure, 2084 widths of, 100, 780 time intervals and calendar calculations, 103 date values, 99 frequency, 84, 122 sampling frequency, 84 time intervals, functions interval functions, 97 time intervals, shifted shifted intervals, 129 time range DATASOURCE procedure, 588 time range of data DATASOURCE procedure, 570 time ranges, 2734, 2803, 2878 of time series, 77 time ranges of time series data, 77 time series definition, 2608 diagnostic tests, 2681 in SAS data sets, 2608 simulation, 2788, 2882 time series cross sectional form TSCSREG procedure and, 80 time series cross-sectional form BY groups and, 79 ID variables for, 79 of time series data set, 79 TSCSREG procedure and, 1919 time series cross-sectional form of time series data set, 79
time series data aggregation of, 765, 768 changing periodicity, 122, 765 converting frequency of, 765 differencing, 104–110 distribution of, 768 embedded missing values in, 78 giving dates to, 67 ID variable for, 67 interpolation, 123 interpolation of, 121, 122, 767 lagging, 104–110 leads, 111 merging series, 117 missing values, 767 missing values and, 77, 78 omitted observations in, 78 overlay plot of, 88 percent change calculations, 109, 110 periodicity of, 84, 122 PLOT procedure, 91 plotting, 86 reading, 66, 125 reading, with DATA step, 123, 124 sampling frequency of, 84, 122 SAS data sets and, 65 SAS language features for, 64 seasonal adjustment, 2228, 2297 SGPLOT procedure, 86 sorting, 72 splitting series, 116 standard form of, 76 stored in SAS data sets, 75 summation of, 112, 113 time ranges of, 77 Time Series Viewer, 86 transformation of, 770, 786 transposing, 117, 119 time series data and missing values, 77 time series data set interleaved form of, 80 time series cross-sectional form of, 79 time series forecasting, 2880 Time Series Forecasting System invoking, 2774 invoking from SAS/AF and SAS/EIS applications, 2774 running in unattended mode, 2774 time series methods FORECAST procedure, 830 time series variables DATASOURCE procedure, 568, 593 Time Series Viewer, 2647, 2719, 2883
3112 F Subject Index
graphs, 2647 invoking, 2773 plots, 2647 saving graphs and tables, 2857, 2859 time series data, 86 Time Series Viewer procedure plotting time series, 86 time trend models FORECAST procedure, 828 Time Value Analysis, 3048 time values defined, 69 formats, 146 functions, 147 informats, 140 syntax for, 69 time variable, 1124 MODEL procedure, 1124 time variables computing from datetime values, 97 introduced, 94 TIMEPLOT procedure, 50, 92 TIMESERIES procedure BY groups, 1860 ODS graph names, 1899 to higher frequency interpolation, 122 to lower frequency interpolation, 122 to SAS/ETS software menu interfaces, 46, 47 to standard form transposing time series, 117, 119 tobit QLIM Procedure, 1422 Toeplitz matrix AUTOREG procedure, 371 total multipliers SIMLIN procedure, 1664, 1668, 1671, 1672 trading-day component X11 procedure, 2228, 2234 transfer function model ARIMA procedure, 216, 221, 254 denominator factors, 222 numerator factors, 221 transfer functions, 2910 forecasting models, 2910 transformation of time series data, 770, 786 transformation of time series EXPAND procedure, 770, 786 transformations, 2866 Box Cox, 2895 Box Cox transformation, 2895
log, 2895 log transformation, 2895 logistic, 2895 square root, 2895 square root transformation, 2895 transformed models predicted values, 1111 transition equation of a state space model, 1717 transition matrix of a state space model, 1717 TRANSPOSE procedure, 50, 117, 119, 120, 124 transposing SAS data sets, 50 TRANSPOSE procedure and transposing time series, 117 transposing SAS data sets, 50 time series data, 117, 119 transposing SAS data sets, see TRANSPOSE procedure transposing time series cross-sectional dimensions, 119 from interleaved form, 117 from standard form, 120 to standard form, 117, 119 TRANSPOSE procedure and, 117 trend changes specifying, 2758 trend curves, 2912 cubic, 2912 exponential, 2913 forecasting models, 2743 hyperbolic, 2913 linear, 2912 logarithmic, 2913 logistic, 2912 power curve, 2913 predictor variables, 2912 quadratic, 2912 specifying, 2743 trend cycle component X11 procedure, 2228, 2234 trend test, 2916 TRIM operator, 792 TRIMLEFT operator, 792 TRIMRIGHT operator, 792 triple exponential smoothing, see exponential smoothing troubleshooting estimation convergence problems MODEL procedure, 1080 troubleshooting simulation problems MODEL procedure, 1192 true interest rate LOAN procedure, 896
Subject Index F 3113
Truncated Regression Models QLIM procedure, 1449 trust region optimization methods, 362, 524, 941 trust region method, 362, 524, 941 AUTOREG procedure, 350 TSCSREG procedure BY groups, 1927 estimation techniques, 1922 ID variables, 1927 output table names, 1930 panel data, 1919 TSCSREG procedure and time series cross sectional form, 80 time series cross-sectional form, 1919 TSVIEW command, 2773 Turning Point test, 359, 396 Turning Point test for Independence, 359, 396 two-stage least squares, 1059 2SLS estimation method, 1762 SYSLIN procedure, 1767, 1796 two-step full transform method AUTOREG procedure, 374 two-way fixed effects model, 1333 random effects model, 1342 two-way fixed effects model PANEL procedure, 1333 two-way fixed-effects model, 1333 two-way random effects model PANEL procedure, 1342 two-way random-effects model, 1342 type of input data file DATASOURCE procedure, 582 _TYPE_ variable and interleaved time series, 80, 81 overlay plots, 90 TYPE=EST data set SIMLIN procedure, 1667 types of loans LOAN procedure, 872 Types of Tobit Model QLIM procedure, 1447 U.S. Bureau of Economic Analysis data files DATASOURCE procedure, 634 U.S. Bureau of Labor Statistics data files DATASOURCE procedure, 635 UCM procedure BY groups, 1952 ODS graph names, 2006 ODS Graphics, 1946 ODS table names, 2003
parameters, 1949–1960, 1962–1972 state space model, 1979 Statistical Graphics, 1992 syntax, 1943 table names, 2003 time intervals, 1959 unattended mode, 2774 unconditional forecasts ARIMA procedure, 261 unconditional least squares AR initial conditions, 1141 MA Initial Conditions, 1142 uncorrected sum of squares statistics of fit, 2917 underlying model smoothing models, 2897 Uniform Periodic Equivalent, 3052 unit root Dickey-Fuller test, 162 of a time series, 158 significance probabilities, 162 testing for, 158 unit roots KPSS test, 393 Phillips-Perron test, 355–357, 389 univariate autoregression, 1143 univariate model diagnostic checks VARMAX procedure, 2149 univariate moving average models, 1149 UNIVARIATE procedure, 50, 1267 descriptive statistics, 50 unlinking viewer windows, 2723 unrestricted vector autoregression, 1145 use with SAS/ETS procedures time intervals, 85 used for state space modeling Kalman filter, 1718 used to select state space models Akaike information criterion, 1739 vector autoregressive models, 1738 Yule-Walker estimates, 1738 Using CROSSLIST= option to create a view SASEFAME engine, 2502 Using Fame expressions and Fame functions in an INSET SASEFAME engine, 2502 Using INSET= option with the CROSSLIST= option to create a view SASEFAME engine, 2502 Using INSET= option with the KEEPLIST= clause to create a view SASEFAME engine, 2502 Using KEEPLIST clause to create a view SASEFAME engine, 2502
3114 F Subject Index
using models to forecast MODEL procedure, 1169 Using RANGE= option to create a view SASEFAME engine, 2502 using solution modes MODEL procedure, 1166 Using WHERE clause with INSET= option to create a view SASEFAME engine, 2502 Using WILDCARD= option to create a view SASEFAME engine, 2502 V matrix Generalized Method of Moments, 1062, 1067 VALIDVARNAME=ANY, SAS option statement SASEFAME engine, 2512, 2517 variable list DATASOURCE procedure, 591 variables in model program MODEL procedure, 1200 variance components Fuller Battese, 1340 Nerlove, 1342 Wallace Hussain, 1341 Wansbeek Kapteyn’s, 1340 VARMAX procedure Akaike Information Criterion, 2148 asymptotic distribution of impulse response functions, 2135, 2143 asymptotic distribution of the parameter estimation, 2143 Bayesian vector autoregressive models, 2096, 2139 cointegration, 2150 cointegration testing, 2094, 2154 common trends, 2150 common trends testing, 2096, 2151 computational details, 2192 confidence limits, 2179 convergence problems, 2192 covariance stationarity, 2174 CPU requirements, 2193 decomposition of prediction error covariance, 2089, 2125 Dickey-Fuller test, 2094 differencing, 2086 dynamic simultaneous equation models, 2108 example of Bayesian VAR modeling, 2058 example of Bayesian VECM modeling, 2065 example of causality testing, 2073 example of cointegration testing, 2061 example of multivariate GARCH modeling, 2175
example of restricted parameter estimation and testing, 2071 example of VAR modeling, 2051 example of VARMA modeling, 2144 example of vector autoregressive modeling with exogenous variables, 2066 example of vector error correction modeling, 2060 forecasting, 2122 forecasting of Bayesian vector autoregressive models, 2140 Granger causality test, 2136 impulse response function, 2090, 2111 infinite order AR representation, 2090 infinite order MA representation, 2090, 2111 invertibility, 2141 long-run relations testing, 2163 memory requirements, 2193 minimum information criteria method, 2132 missing values, 2104 multivariate GARCH Modeling, 2099 multivariate model diagnostic checks, 2148 ODS graph names, 2191 Output Data Sets, 2178 partial autoregression coefficient, 2091, 2128 partial canonical correlation, 2091, 2131 partial correlation, 2129 prediction error covariance, 2089, 2122, 2124 sample cross covariances, 2089, 2127 sample cross-correlations, 2089, 2127 state-space representation, 2105 stationarity, 2133, 2141 tentative order selection, 2127 time intervals, 2084 univariate model diagnostic checks, 2149 vector autoregressive models, 2133 vector autoregressive models with exogenous variables , 2136 vector autoregressive moving-average models, 2104, 2141 vector error correction models, 2098, 2153 weak exogeneity testing, 2165 Yule-Walker estimates, 2092 vector autoregressive models, 1148 used to select state space models, 1738 VARMAX procedure, 2133 vector autoregressive models with exogenous variables VARMAX procedure, 2136 vector autoregressive moving-average models VARMAX procedure, 2104, 2141 vector error correction models VARMAX procedure, 2098, 2153 vector moving average models, 1151
Subject Index F 3115
viewing a Fame database, see SASEFAME engine viewing a Haver database, see SASEHAVR engine viewing time series, 2647 Wald test linear hypotheses, 694 nonlinear hypotheses, 962, 1056, 1128, 1458 Wallace Hussain variance components, 1341 Wansbeek Kapteyn’s variance components, 1340 weak exogeneity testing VARMAX procedure, 2165 _WEIGHT_ variable MODEL procedure, 1102 weights, see smoothing weights WHERE in the DATA step SASEFAME engine, 2520 WHERE statement subsetting data, 51 white noise test SPECTRA procedure, 1699, 1702 white noise test of the residuals, 237 white noise test of the series, 235 White’s test, 1100 heteroscedasticity tests, 1100 widths of time intervals, 100, 780 WINTERS method seasonal forecasting, 843 Winters Method, 2906, 2907 Holt-Winters Method, 2906 smoothing models, 2906, 2907 Winters method FORECAST procedure, 818, 843 Holt-Winters method, 847 Wong and Li’s test, 404 Wong and Li’s test for Heteroscedasticity, 404 X-11 ARIMA methodology X11 procedure, 2252 X-11 seasonal adjustment method, see X11 procedure X-11-ARIMA seasonal adjustment method, see X11 procedure X-12 seasonal adjustment method, see X12 procedure X-12-ARIMA seasonal adjustment method, see X12 procedure X11 procedure BY groups, 2240 Census X-11 method, 2228 Census X-11 methodology, 2253
data requirements, 2258 differences with X11ARIMA/88, 2252 ID variables, 2240, 2242 irregular component, 2228, 2234 model selection for X-11-ARIMA method, 2262 output data sets, 2265, 2266 output table names, 2279 printed output, 2268 seasonal adjustment, 2228, 2234 seasonal component, 2228 trading-day component, 2228, 2234 trend cycle component, 2228, 2234 X-11 ARIMA methodology, 2252 X-11 seasonal adjustment method, 2228 X-11-ARIMA seasonal adjustment method, 2228 X12 procedure BY groups, 2311 Census X-12 method, 2296 ID variables, 2311 INPUT variables, 2313 ODS Graphics, 2308 seasonal adjustment, 2297 seasonal component, 2297 X-12 seasonal adjustment method, 2296 X-12-ARIMA seasonal adjustment method, 2296 year-over-year percent change calculations, 109 yearly averages percent change calculations, 110 Yule-Walker AR initial conditions, 1141 Yule-Walker equations AUTOREG procedure, 371 STATESPACE procedure, 1738 Yule-Walker estimates AUTOREG procedure, 371 used to select state space models, 1738 VARMAX procedure, 2092 Yule-Walker method as generalized least-squares, 374 Zellner estimation, see seemingly unrelated regression Zellner’s two-stage method PANEL procedure, 1349 zooming graphs, 2721
3116
Syntax Index 2SLS option FIT statement (MODEL), 1036, 1059 PROC SYSLIN statement, 1783 3SLS option FIT statement (MODEL), 1036, 1061, 1160 PROC SYSLIN statement, 1783 A option PROC SPECTRA statement, 1693 A= option FIXED statement (LOAN), 885 ABORT, 1215 ABS function, 1208 ACCEPTDEFAULT option AUTOMDL statement (X12), 2322 ACCUMULATE= option FORECAST statement (ESM), 733 ID statement (ESM), 736 ID statement (SIMILARITY), 1599 ID statement (TIMESERIES), 1865 INPUT statement (SIMILARITY), 1602 TARGET statement (SIMILARITY), 1605 VAR statement (TIMESERIES), 1875 ADDITIVE option MONTHLY statement (X11), 2241 QUARTERLY statement (X11), 2246 ADDMAXIT= option MODEL statement (MDC), 939 ADDRANDOM option MODEL statement (MDC), 939 ADDVALUE option MODEL statement (MDC), 939 ADF= option ARM statement (LOAN), 890 ADJMEAN SPECTRA statement (TIMESERIES), 1870 ADJMEAN option PROC SPECTRA statement, 1693 ADJSMMV option FIT statement (MODEL), 1034 ADJUST statement X12 procedure, 2314 ADJUSTFREQ= option ARM statement (LOAN), 890 AGGMODE=RELAXED option LIBNAME statement (SASEHAVR), 2561 AGGMODE=STRICT option LIBNAME statement (SASEHAVR), 2561
ALIGN= option FORECAST statement (ARIMA), 147, 241 ID statement (ENG), 147 ID statement (ESM), 147, 737 ID statement (HPF), 147 ID statement (HPFDIAGNOSE), 147 ID statement (HPFEVENTS), 147 ID statement (SIMILARITY), 147, 1600 ID statement (TIMESERIES), 147, 1866 ID statement (UCM), 147, 1959 ID statement (VARMAX), 147, 2084 PROC DATASOURCE statement, 147, 582 PROC EXPAND statement, 147, 773, 779 PROC FORECAST statement, 147, 834 TIMEID procedure, 1830 ALL option COMPARE statement (LOAN), 892 MODEL statement (AUTOREG), 351 MODEL statement (MDC), 940 MODEL statement (PDLREG), 1403 MODEL statement (SYSLIN), 1787 PROC SYSLIN statement, 1784 TEST statement (ENTROPY), 694 TEST statement (MDC), 947 TEST statement (MODEL), 1056 TEST statement (PANEL), 1329 TEST statement (QLIM), 1441 ALPHA option SPECTRA statement (TIMESERIES), 1870 ALPHA= option FORECAST statement (ARIMA), 241 FORECAST statement (ESM), 733 FORECAST statement (UCM), 1958 IDENTIFY statement (ARIMA), 231 MODEL statement, 1517 MODEL statement (SYSLIN), 1787 OUTLIER statement (ARIMA), 240 OUTPUT statement (VARMAX), 2101 PROC FORECAST statement, 834 PROC SYSLIN statement, 1783 ALPHA=option OUTLIER statement (UCM), 1965 ALPHACLI= option OUTPUT statement (AUTOREG), 367 OUTPUT statement (PDLREG), 1405 ALPHACLM= option OUTPUT statement (AUTOREG), 367 OUTPUT statement (PDLREG), 1405
3118 F Syntax Index
ALPHACSM= option OUTPUT statement (AUTOREG), 368 ALTPARM option ESTIMATE statement (ARIMA), 235, 257 ALTW option PROC SPECTRA statement, 1693 AMOUNT= option FIXED statement (LOAN), 885 AMOUNTPCT= option FIXED statement (LOAN), 886 AOCV= option OUTLIER statement (X12), 2325 APCT= option FIXED statement (LOAN), 886 %AR macro, 1147, 1148 AR option IRREGULAR statement (UCM), 1962 AR= option BOXCOXAR macro, 155 DFTEST macro, 159 ESTIMATE statement (ARIMA), 238 LOGTEST macro, 160 PROC FORECAST statement, 834 ARCHTEST option MODEL statement (AUTOREG), 351 ARCOS function, 1208 ARIMA procedure, 224 syntax, 224 ARIMA procedure, PROC ARIMA statement PLOT option, 228 ARIMA statement X11 procedure, 2237 X12 procedure, 2314 ARM statement LOAN procedure, 889 ARMACV= option AUTOMDL statement (X12), 2322 ARMAX= option PROC STATESPACE statement, 1731 ARSIN function, 1208 ARTEST= option MODEL statement (PANEL), 1324 ASCII option PROC DATASOURCE statement, 582 ASTART= option PROC FORECAST statement, 834 AT= option COMPARE statement (LOAN), 893 ATAN function, 1208 ATOL= option MODEL statement (PANEL), 1324 ATTRIBUTE statement DATASOURCE procedure, 589 AUTOMDL statement
X12 procedure, 2320 AUTOREG procedure, 342 syntax, 342 AUTOREG procedure, AUTOREG statement, 347 AUTOREG statement UCM procedure, 1949 AUXDATA= option PROC X12 statement, 2307 B option ARM statement (LOAN), 891 BACK= option ESTIMATE statement (UCM), 1955 FORECAST statement (ARIMA), 241 FORECAST statement (UCM), 1958 OUTPUT statement (VARMAX), 2101 PROC ESM statement, 730 PROC STATESPACE statement, 1733 BACKCAST= option ARIMA statement (X11), 2237 BACKLIM= option ESTIMATE statement (ARIMA), 238 BACKSTEP option MODEL statement (AUTOREG), 360 BALANCED option AUTOMODL statement (X12), 2322 BALLOON statement LOAN procedure, 889 BALLOONPAYMENT= option BALLOON statement (LOAN), 889 BANDOPT= option MODEL statement (PANEL), 1324 BASE = option PROC PANEL statement, 1320 BCX option MODEL statement (QLIM), 1438 BDS option MODEL statement (AUTOREG), 351 BESTCASE option ARM statement (LOAN), 891 BI option COMPARE statement (LOAN), 893 BLOCK option PROC MODEL statement, 1022, 1227 BLOCKSEASON statement UCM procedure, 1950 BLOCKSIZE= option BLOCKSEASON statement (UCM), 1951 BLUS= option OUTPUT statement (AUTOREG), 368 BOUNDARYALIGN= option ID statement (TIMESERIES), 1867 BOUNDS statement COUNTREG procedure, 525
Syntax Index F 3119
ENTROPY procedure, 688 MDC procedure, 935 MODEL procedure, 1024 QLIM procedure, 1432 BOXCOXAR macro, 155 macro variable, 156 BP option COMPARE statement (LOAN), 893 MODEL statement (PANEL), 1324, 1325 BREAKINTEREST option COMPARE statement (LOAN), 893 BREAKPAYMENT option COMPARE statement (LOAN), 893 BREUSCH= option FIT statement (MODEL), 1039 BSTART= option PROC FORECAST statement, 835 BTOL= option MODEL statement (PANEL), 1325 BTWNG option MODEL statement (PANEL), 1325 BUYDOWN statement LOAN procedure, 892 BUYDOWNRATES= option BUYDOWN statement (LOAN), 892 BY statement ARIMA procedure, 231 AUTOREG procedure, 347 COMPUTAB procedure, 480 COUNTREG procedure, 525 ENTROPY procedure, 690 ESM procedure, 733 EXPAND procedure, 775 FORECAST procedure, 838 MDC procedure, 936 MODEL procedure, 1026 PANEL procedure, 1320 PDLREG procedure, 1402 QLIM procedure, 1433 SEVERITY procedure, 1514 SIMILARITY procedure, 1598 SIMLIN procedure, 1664 SPECTRA procedure, 1694 STATESPACE procedure, 1734 SYSLIN procedure, 1785 TIMEID procedure, 1829 TIMESERIES procedure, 1860 TSCSREG procedure, 1927 UCM procedure, 1952 VARMAX procedure, 2080 X11 procedure, 2240 X12 procedure, 2311
C= option MODEL statement, 1517 CANCORR option PROC STATESPACE statement, 1731 CAPS= option ARM statement (LOAN), 890 CAUCHY option ERRORMODEL statement (MODEL), 1030 CAUSAL statement VARMAX procedure, 2081 CDEC= option PROC COMPUTAB statement, 473 CDF= option ERRORMODEL statement (MODEL), 1031 CELL statement COMPUTAB procedure, 477 CENSORED option ENDOGENOUS statement (QLIM), 1435 MODEL statement (ENTROPY), 691 CENTER SPECTRA statement (TIMESERIES), 1870 CENTER option ARIMA statement (X11), 2239 IDENTIFY statement (ARIMA), 231 MODEL statement (AUTOREG), 348 MODEL statement (VARMAX), 2085 PROC SPECTRA statement, 1693 CEV= option OUTPUT statement (AUTOREG), 368 CHAR option COLUMNS statement (COMPUTAB), 474 ROWS statement (COMPUTAB), 476 CHARTS= option MONTHLY statement (X11), 2241 QUARTERLY statement (X11), 2246 CHECK statement X12 procedure, 2315 CHECKBREAK option LEVEL statement (UCM), 1964 CHICR= option ARIMA statement (X11), 2237 CHISQUARED option ERRORMODEL statement (MODEL), 1030 CHOICE= option MODEL statement (MDC), 937 CHOW= option FIT statement (MODEL), 1039, 1131 MODEL statement (AUTOREG), 352 CLAG option LAG statement (PANEL), 1323 CLAG statement LAG statement (PANEL), 1323 CLASS statement MDC procedure, 936
3120 F Syntax Index
PANEL procedure, 1320 CLEAR option IDENTIFY statement (ARIMA), 231 CLIMIT= option FORECAST command (TSFS), 2776 CMPMODEL options, 1021 COEF option MODEL statement (AUTOREG), 353 MODEL statement (PDLREG), 1403 PROC SPECTRA statement, 1694 COEF= option HETERO statement (AUTOREG), 363 COINTEG statement VARMAX procedure, 2082, 2164 COINTTEST= option MODEL statement (VARMAX), 2094 COINTTEST=(JOHANSEN) option MODEL statement (VARMAX), 2094 COINTTEST=(JOHANSEN=(IORDER=)) option MODEL statement (VARMAX), 2095, 2171 COINTTEST=(JOHANSEN=(NORMALIZE=)) option MODEL statement (VARMAX), 2095, 2157 COINTTEST=(JOHANSEN=(TYPE=)) option MODEL statement (VARMAX), 2095 COINTTEST=(SIGLEVEL=) option MODEL statement (VARMAX), 2096 COINTTEST=(SW) option MODEL statement (VARMAX), 2096, 2152 COINTTEST=(SW=(LAG=)) option MODEL statement (VARMAX), 2096 COINTTEST=(SW=(TYPE=)) option MODEL statement (VARMAX), 2096 COLLIN option ENTROPY procedure, 685 FIT statement (MODEL), 1039 ‘column headings’ option COLUMNS statement (COMPUTAB), 474 COLUMNS statement COMPUTAB procedure, 474 COMPARE statement LOAN procedure, 892 COMPOUND= option FIXED statement (LOAN), 886 COMPRESS= option TARGET statement (SIMILARITY), 1605 COMPUTAB procedure, 470 syntax, 470 CONDITIONAL OUTPUT statement (QLIM), 1439 CONST= option BOXCOXAR macro, 155 LOGTEST macro, 161 CONSTANT= option
OUTPUT statement (AUTOREG), 368 OUTPUT statement (PDLREG), 1405 CONTROL, 1254 CONTROL statement MODEL procedure, 1029, 1200 CONVERGE= option ARIMA statement (X11), 2238 ENTROPY procedure, 687 ESTIMATE statement (ARIMA), 238 FIT statement (MODEL), 1041, 1078, 1086, 1088 MODEL statement (AUTOREG), 360 MODEL statement (MDC), 937 MODEL statement (PDLREG), 1403 PROC SYSLIN statement, 1783 SOLVE statement (MODEL), 1054 CONVERT statement EXPAND procedure, 776 CONVERT= option LIBNAME statement (SASEFAME), 2504 COPULA= option SOLVE statement (MODEL), 1053 CORR option FIT statement (MODEL), 1039 MODEL statement (PANEL), 1325 MODEL statement (TSCSREG), 1928 CORR statement TIMESERIES procedure, 1861 CORRB option ESTIMATE statement (MODEL), 1032 FIT statement (MODEL), 1040 MODEL statement, 527 MODEL statement (AUTOREG), 353 MODEL statement (MDC), 941 MODEL statement (PANEL), 1325 MODEL statement (PDLREG), 1403 MODEL statement (SYSLIN), 1787 MODEL statement (TSCSREG), 1928 PROC COUNTREG statement, 524 QLIM procedure, 1431 CORROUT option PROC PANEL statement, 1318 PROC QLIM statement, 1431 PROC TSCSREG statement, 1926 CORRS option FIT statement (MODEL), 1040 COS function, 1208 COSH function, 1208 COST option ENDOGENOUS statement (QLIM), 1436 COUNTREG procedure, 521 syntax, 521 COUNTREG procedure, CLASS statement, 526 COUNTREG procedure, FREQ statement, 526
Syntax Index F 3121
COUNTREG procedure, WEIGHT statement, 530 COV option FIT statement (MODEL), 1040 COV3OUT option PROC SYSLIN statement, 1783 COVB option ESTIMATE statement (MODEL), 1032 FIT statement (MODEL), 1040 MODEL statement, 527 MODEL statement (AUTOREG), 353 MODEL statement (MDC), 940 MODEL statement (PANEL), 1325 MODEL statement (PDLREG), 1403 MODEL statement (SYSLIN), 1787 MODEL statement (TSCSREG), 1928 PROC COUNTREG statement, 524 PROC STATESPACE statement, 1732 QLIM procedure, 1431 COVBEST= option ENTROPY procedure, 685 FIT statement (MODEL), 1035, 1071 COVEST= option MODEL statement (AUTOREG), 353 MODEL statement (MDC), 940 PROC COUNTREG statement, 524 QLIM procedure, 1431 COVOUT option ENTROPY procedure, 686 FIT statement (MODEL), 1038 PROC AUTOREG statement, 346 PROC COUNTREG statement, 524 PROC MDC statement, 934 PROC PANEL statement, 1318 PROC QLIM statement, 1431 PROC SEVERITY statement, 1511 PROC SYSLIN statement, 1783 PROC TSCSREG statement, 1926 COVS option FIT statement (MODEL), 1040, 1076 CPEV= option OUTPUT statement (AUTOREG), 368 CRITERION= option MODEL statement, 1516 CROSS option PROC SPECTRA statement, 1694 CROSSCORR statement TIMESERIES procedure, 1862 CROSSCORR= option IDENTIFY statement (ARIMA), 232 CROSSLIST= option LIBNAME statement (SASEFAME), 2507 CROSSPLOTS= option PROC TIMESERIES statement, 1857 CROSSVAR statement
TIMESERIES procedure, 1875 CRSPLINKPATH= option LIBNAME statement (SASECRSP), 2409 CSPACE= option PROC COMPUTAB statement, 473 CSTART= option PROC FORECAST statement, 835 CUSIP= option LIBNAME statement (SASECRSP), 2407 CUSUM= option OUTPUT statement (AUTOREG), 368 CUSUMLB= option OUTPUT statement (AUTOREG), 368 CUSUMSQ= option OUTPUT statement (AUTOREG), 368 CUSUMSQLB= option OUTPUT statement (AUTOREG), 368 CUSUMSQUB= option OUTPUT statement (AUTOREG), 368 CUSUMUB= option OUTPUT statement (AUTOREG), 368 CUTOFF= option SSPAN statement (X11), 2249 CV= option OUTLIER statement (X12), 2324 CWIDTH= option PROC COMPUTAB statement, 473 CYCLE statement UCM procedure, 1952 DASILVA option MODEL statement (PANEL), 1325 MODEL statement (TSCSREG), 1929 DATA Step IF Statement, 73 WHERE Statement, 73 DATA step DROP statement, 74 KEEP statement, 74 DATA= option ENTROPY procedure, 685 FIT statement (MODEL), 1037, 1154 FORECAST command (TSFS), 2774, 2775 IDENTIFY statement (ARIMA), 232 PROC ARIMA statement, 227 PROC AUTOREG statement, 346 PROC COMPUTAB statement, 472 PROC COUNTREG statement, 523 PROC ESM statement, 730 PROC EXPAND statement, 773 PROC FORECAST statement, 835 PROC MDC statement, 934 PROC MODEL statement, 1020 PROC PANEL statement, 1318
3122 F Syntax Index
PROC PDLREG statement, 1401 PROC QLIM statement, 1431 PROC SEVERITY statement, 1511 PROC SIMILARITY statement, 1596 PROC SIMLIN statement, 1663, 1670 PROC SPECTRA statement, 1694 PROC STATESPACE statement, 1730 PROC SYSLIN statement, 1783 PROC TIMEID statement, 1828 PROC TIMESERIES statement, 1857 PROC TSCSREG statement, 1926 PROC UCM statement, 1946 PROC VARMAX statement, 2077 PROC X11 statement, 2236 PROC X12 statement, 2305 SOLVE statement (MODEL), 1050, 1200 TSVIEW command (TSFS), 2774, 2775 DATASOURCE procedure, 580 syntax, 580 DATE function, 148 DATE= option MONTHLY statement (X11), 2242 PROC X12 statement, 2305 QUARTERLY statement (X11), 2247 DATEJUL function, 95 DATEJUL function, 148 DATEPART function, 96, 148 DATETIME function, 148 DAY function, 95, 148 DBNAME= option PROC DATASOURCE statement, 582 DBTYPE= option PROC DATASOURCE statement, 582 DBVERSION= option LIBNAME statement (SASEFAME), 2507 DECOMP statement TIMESERIES procedure, 1863 DEGREE= option SPLINEREG statement (UCM), 1970 SPLINESEASON statement (UCM), 1972 DELTA= option ESTIMATE statement (ARIMA), 239 DEPLAG statement UCM procedure, 1954 DETAILS option FIT statement (MODEL), 1096 PROC MODEL statement, 1023 DETTOL= option PROC STATESPACE statement, 1732 DFPVALUE
macro, 157 macro variable, 158, 159 DFTEST macro, 158 DFTEST option MODEL statement (VARMAX), 2094, 2193 DFTEST=(DLAG=) option MODEL statement (VARMAX), 2094 DHMS function, 96 DHMS function, 148 DIAG= option FORECAST command (TSFS), 2777 DIF function, 104 DIF function MODEL procedure, 107 DIF= option BOXCOXAR macro, 155 DFTEST macro, 159 INPUT statement (SIMILARITY), 1602 LOGTEST macro, 161 MODEL statement (VARMAX), 2085 TARGET statement (SIMILARITY), 1606 VAR statement (TIMESERIES), 1875 DIFF= option IDENTIFY statement (X12), 2319 DIFFORDER= option AUTOMDL statement (X12), 2321 DIFX= option MODEL statement (VARMAX), 2086 DIFY= option MODEL statement (VARMAX), 2086, 2205 DIMMAX= option PROC STATESPACE statement, 1732 DISCRETE option ENDOGENOUS statement (QLIM), 1434 DIST statement SEVERITY procedure, 1517 DIST= option COUNTREG statement (COUNTREG), 527 MODEL statement (AUTOREG), 348 MODEL statement (COUNTREG), 527 DISTRIBUTION= option ENDOGENOUS statement (QLIM), 1434 DLAG= option DFPVALUE macro, 157 DFTEST macro, 159 DO, 1213 DOL option ROWS statement (COMPUTAB), 476 DOWNPAYMENT= option FIXED statement (LOAN), 886 DOWNPAYPCT= option
Syntax Index F 3123
FIXED statement (LOAN), 886 DP= option FIXED statement (LOAN), 886 DPCT= option FIXED statement (LOAN), 886 DROP statement DATASOURCE procedure, 585 DROP= option FIT statement (MODEL), 1034 LIBNAME statement (SASEHAVR), 2559 DROPEVENT statement DATASOURCE procedure, 587 DROPGEOG1= option LIBNAME statement (SASEHAVR), 2560 DROPGEOG2= option LIBNAME statement (SASEHAVR), 2561 DROPGROUP= option LIBNAME statement (SASEHAVR), 2560 DROPH= option SEASON statement (UCM), 1967 DROPLONG= option LIBNAME statement (SASEHAVR), 2560 DROPSHORT= option LIBNAME statement (SASEHAVR), 2560 DROPSOURCE= option LIBNAME statement (SASEHAVR), 2560 DUAL option ENTROPY procedure, 687 DUL option ROWS statement (COMPUTAB), 476 DUPLICATES option TIMEID procedure, 1830 DW option FIT statement (MODEL), 1040 MODEL statement (SYSLIN), 1787 DW= option MODEL statement (AUTOREG), 353 MODEL statement (PDLREG), 1403 DWPROB option FIT statement (MODEL), 1040 MODEL statement (AUTOREG), 353 MODEL statement (PDLREG), 1403 DYNAMIC option FIT statement (MODEL), 1035, 1120, 1122 SOLVE statement (MODEL), 1051, 1118, 1166 EBCDIC option PROC DATASOURCE statement, 582 ECM= option MODEL statement (VARMAX), 2098 ECM=(ECTREND) option MODEL statement (VARMAX), 2098, 2161 ECM=(NORMALIZE=) option
MODEL statement (VARMAX), 2063, 2098 ECM=(RANK=) option MODEL statement (VARMAX), 2063, 2098 Empirical Distribution Estimation ERRORMODEL statement (MODEL), 1073 EMPIRICAL= option ERRORMODEL statement (MODEL), 1031 EMPIRICALCDF= option MODEL statement, 1516 END= option ID statement (ESM), 737 ID statement (SIMILARITY), 1601 ID statement (TIMESERIES), 1867 LIBNAME statement (SASEHAVR), 2559 MONTHLY statement (X11), 2242 QUARTERLY statement (X11), 2247 ENDOGENOUS statement MODEL procedure, 1029, 1200 SIMLIN procedure, 1665 SYSLIN procedure, 1785 ENTROPY procedure, 683 syntax, 683 ENTRY= option FORECAST command (TSFS), 2776 EPSILON = option FIT statement (MODEL), 1042 ERRORMODEL statement MODEL procedure, 1030 ERRSTD OUTPUT statement (QLIM), 1439 ESACF option IDENTIFY statement (ARIMA), 232 ESM, 725 ESM procedure, 728 syntax, 728 EST= option PROC SIMLIN statement, 1663, 1669 ESTDATA= option FIT statement (MODEL), 1037, 1155 SOLVE statement (MODEL), 1050, 1170, 1198 ESTIMATE statement ARIMA procedure, 235 MODEL procedure, 1031 UCM procedure, 1955 X12 procedure, 2316 ESTIMATEDCASE= option ARM statement (LOAN), 891 ESTPRINT option PROC SIMLIN statement, 1663 ESUPPORTS= option MODEL statement (ENTROPY), 691 EVENT statement X12 procedure, 2311
3124 F Syntax Index
EXCLUDE= option FIT statement (MODEL), 1136 INSTRUMENTS statement (MODEL), 1044 MONTHLY statement (X11), 2242 EXOGENEITY option COINTEG statement (VARMAX), 2082, 2167 EXOGENOUS statement MODEL procedure, 1033, 1200 SIMLIN procedure, 1665 EXP function, 1208 EXPAND procedure, 772 CONVERT statement, 786 syntax, 772 EXPAND= option TARGET statement (SIMILARITY), 1606 EXPECTED OUTPUT statement (QLIM), 1439 EXTRADIFFUSE= option ESTIMATE statement (UCM), 1955 FORECAST statement (UCM), 1958 EXTRAPOLATE option PROC EXPAND statement, 774 F option ERRORMODEL statement (MODEL), 1030 FACTOR= option PROC EXPAND statement, 773, 778 FAMEOUT= option LIBNAME statement (SASEFAME), 2507 FAMEPRINT option PROC DATASOURCE statement, 582 FCMPOPT statement SIMILARITY procedure, 1599 FILETYPE= option PROC DATASOURCE statement, 582 FIML option FIT statement (MODEL), 1035, 1069, 1155, 1260 PROC SYSLIN statement, 1783 FINAL= option X11 statement (X12), 2338 FIRST option PROC SYSLIN statement, 1784 FIT statement MODEL procedure, 1033 FIT statement, MODEL procedure GINV= option, 1035 FIXED statement LOAN procedure, 885 FIXEDCASE option ARM statement (LOAN), 891 FIXONE option MODEL statement (PANEL), 1325
MODEL statement (TSCSREG), 1928 FIXONETIME option MODEL statement (PANEL), 1325 FIXTWO option MODEL statement (PANEL), 1325 MODEL statement (TSCSREG), 1929 FLATDATA statement PANEL procedure, 1320 FLOW option PROC MODEL statement, 1023 FORCE= option X11 statement (X12), 2338 FORCE=FREQ option LIBNAME statement (SASEHAVR), 2561 FORECAST macro, 2775 FORECAST option SOLVE statement (MODEL), 1052, 1166, 1169 FORECAST procedure, 832 syntax, 832 FORECAST statement ARIMA procedure, 241 ESM procedure, 733 UCM procedure, 1957 X12 procedure, 2318 FORECAST= option ARIMA statement (X11), 2238 FORM statement STATESPACE procedure, 1734 FORM= option GARCH statement, 2099 FORMAT statement DATASOURCE procedure, 589 FORMAT= option ATTRIBUTE statement (DATASOURCE), 589 COLUMNS statement (COMPUTAB), 475 ID statement (ESM), 737 ID statement (SIMILARITY), 1601 ID statement (TIMESERIES), 1867 ROWS statement (COMPUTAB), 477 TIMEID procedure, 1830 FREQ= option LIBNAME statement (SASEHAVR), 2559 PROC TIMEID statement, 1828 FREQUENCY= option PROC DATASOURCE statement, 583 FROM= option PROC EXPAND statement, 773, 778 FRONTIER option ENDOGENOUS statement (QLIM), 1435 FSRSQ option FIT statement (MODEL), 1040, 1060, 1137
Syntax Index F 3125
FULLER option MODEL statement (TSCSREG), 1929 FULLWEIGHT= option MONTHLY statement (X11), 2242 QUARTERLY statement (X11), 2247 FUNCTION= option TRANSFORM statement (X12), 2333 FUZZ= option PROC COMPUTAB statement, 472 GARCH statement VARMAX procedure, 2099 GARCH= option MODEL statement (AUTOREG), 349 GCE option ENTROPY procedure, 685 GCENM option ENTROPY procedure, 685 GCONV= option ENTROPY procedure, 687 GENERAL= option ERRORMODEL statement (MODEL), 1030 GENGMMV option FIT statement (MODEL), 1035 GEOG1= option LIBNAME statement (SASEHAVR), 2560 GEOG2= option LIBNAME statement (SASEHAVR), 2561 GETDER function, 1207 GINV option MODEL statement (AUTOREG), 353 MODEL statement (PDLREG), 1403 GINV= option FIT statement (MODEL), 1035 MODEL statement (PANEL), 1325 GME option ENTROPY procedure, 685 GMED option ENTROPY procedure, 685 GMENM option ENTROPY procedure, 685 GMM option FIT statement (MODEL), 1035, 1061, 1104, 1155, 1158–1160 MODEL statement (PANEL), 1325 GODFREY option FIT statement (MODEL), 1040 MODEL statement (AUTOREG), 353 GRAPH option PROC MODEL statement, 1022, 1227 GRID option ESTIMATE statement (ARIMA), 239 GRIDVAL= option ESTIMATE statement (ARIMA), 239
GROUP1 option CAUSAL statement (VARMAX), 2081 GROUP2 option CAUSAL statement (VARMAX), 2081 GROUP= option LIBNAME statement (SASEHAVR), 2560 GROUPS option SSA statement (TIMESERIES), 1871 GVKEY= option LIBNAME statement (SASECRSP), 2406 H= option COINTEG statement (VARMAX), 2082, 2164 HALTONSTART= option MODEL statement (MDC), 937 HAUSMAN option FIT statement (MODEL), 1040, 1130 HCCME= option FIT statement (MODEL), 1035 MODEL statement (PANEL), 1326 HCUSIP= option LIBNAME statement (SASECRSP), 2407 HESSIAN= option FIT statement (MODEL), 1042, 1071 HETERO statement AUTOREG procedure, 362 HEV option MODEL statement (MDC), 937 HMS function, 96 HMS function, 148 HOLIDAY function, 148 HORIZON= option FORECAST command (TSFS), 2776 HOUR function, 148 HRINITIAL option AUTOMDL statement (X12), 2322 HT= option OUTPUT statement (AUTOREG), 368 I option FIT statement (MODEL), 1041, 1093 MODEL statement (PDLREG), 1403 MODEL statement (SYSLIN), 1787 ID statement ENTROPY procedure, 690 ESM procedure, 735 EXPAND procedure, 777 FORECAST procedure, 839 MDC procedure, 936 MODEL procedure, 1043 PANEL procedure, 1321
3126 F Syntax Index
SIMILARITY procedure, 1599 SIMLIN procedure, 1665 STATESPACE procedure, 1734 TIMEID procedure, 1829 TIMESERIES procedure, 1865 TSCSREG procedure, 1927 UCM procedure, 1959 VARMAX procedure, 2084 X11 procedure, 2240 X12 procedure, 2311 ID= option FORECAST command (TSFS), 2774, 2775 FORECAST statement (ARIMA), 241 OUTLIER statement (ARIMA), 240 TSVIEW command (TSFS), 2774, 2775 IDENTIFY statement ARIMA procedure, 231, 240 X12 procedure, 2318 IDENTITY statement SYSLIN procedure, 1786 IF, 1213 INCLUDE, 1217 INCLUDE statement MODEL procedure, 1043 INDEX option PROC DATASOURCE statement, 582 INDID = option PROC PANEL statement, 1321 INDNO= option LIBNAME statement (SASECRSP), 2409 INEST= option PROC SEVERITY statement, 1512 INEVENT= option PROC X12 statement, 2307 INFILE= option PROC DATASOURCE statement, 582 INIT statement COMPUTAB procedure, 478 COUNTREG procedure, 526 QLIM procedure, 1438 INIT= option DIST statement, 1518 FIXED statement (LOAN), 887 INITIAL statement STATESPACE procedure, 1735 INITIAL= option FIT statement (MODEL), 1034, 1122 FIXED statement (LOAN), 887 MODEL statement (AUTOREG), 361 MODEL statement (MDC), 941 INITIALPCT= option FIXED statement (LOAN), 887 INITMISS option PROC COMPUTAB statement, 472
INITPCT= option FIXED statement (LOAN), 887 INITVAL= option ESTIMATE statement (ARIMA), 238 INPUT statement SIMILARITY procedure, 1602 X12 procedure, 2313 INPUT= option ESTIMATE statement (ARIMA), 236, 256 INSET= option LIBNAME statement (SASECRSP), 2410 LIBNAME statement (SASEFAME), 2505, 2507 INSTRUMENTS statement MODEL procedure, 1043, 1134 SYSLIN procedure, 1786 INTCINDEX function, 148 INTCK function, 98 INTCK function, 148 INTCYCLE function, 149 INTEGRATE= option MODEL statement (MDC), 937 INTERIM= option PROC SIMLIN statement, 1663 INTERVAL= option FORECAST command (TSFS), 2774, 2776 FORECAST statement (ARIMA), 242, 263 ID statement (ESM), 737 ID statement (SIMILARITY), 1601 ID statement (TIMESERIES), 1867 ID statement (UCM), 1959 ID statement (VARMAX), 2084 PROC DATASOURCE statement, 583 PROC FORECAST statement, 835 PROC STATESPACE statement, 1733 PROC X12 statement, 2306 TIMEID procedure, 1830 TSVIEW command (TSFS), 2774, 2776 INTERVAL=option FIXED statement (LOAN), 887 INTFIT function, 149 INTFMT function, 149 INTGET function, 149 INTGPRINT option SOLVE statement (MODEL), 1054 INTINDEX function, 150 INTNX function, 97 INTNX function, 150 INTONLY option INSTRUMENTS statement (MODEL), 1045 INTORDER= option MODEL statement (MDC), 937
Syntax Index F 3127
INTPER= option PROC FORECAST statement, 835 PROC STATESPACE statement, 1733 INTSEAS function, 150 INTSHIFT function, 151 INTTEST function, 151 IRREGULAR statement UCM procedure, 1960 IT2SLS option FIT statement (MODEL), 1036 IT3SLS option FIT statement (MODEL), 1036 PROC SYSLIN statement, 1784 ITALL option FIT statement (MODEL), 1041, 1094 ITDETAILS option FIT statement (MODEL), 1041, 1092 ITGMM option FIT statement (MODEL), 1036, 1065 MODEL statement (PANEL), 1326 ITOLS option FIT statement (MODEL), 1036 ITPRINT option ENTROPY procedure, 686 ESTIMATE statement (X12), 2317 FIT statement (MODEL), 1041, 1086, 1092 MODEL statement, 528 MODEL statement (AUTOREG), 353 MODEL statement (MDC), 941 MODEL statement (PANEL), 1326 MODEL statement (PDLREG), 1403 PROC STATESPACE statement, 1732 PROC SYSLIN statement, 1785 QLIM procedure, 1431 SOLVE statement (MODEL), 1054, 1194 ITSUR option FIT statement (MODEL), 1036, 1060 PROC SYSLIN statement, 1784 J= option COINTEG statement (VARMAX), 2083 JACOBI option SOLVE statement (MODEL), 1052 JULDATE function, 95, 151 K option PROC SPECTRA statement, 1694 K= option MODEL statement (SYSLIN), 1787 PROC SYSLIN statement, 1784 KEEP = option PROC PANEL statement, 1321 KEEP statement DATASOURCE procedure, 585
KEEP= option FORECAST command (TSFS), 2777 LIBNAME statement (SASEHAVR), 2559 KEEPEVENT statement DATASOURCE procedure, 586 KEEPH= option SEASON statement (UCM), 1968 KERNEL option FIT statement (MODEL), 1158 SPECTRA statement (TIMESERIES), 1870 KERNEL= option FIT statement (MODEL), 1036, 1062 KLAG= option PROC STATESPACE statement, 1732 KNOTS= option SPLINEREG statement (UCM), 1970 SPLINESEASON statement (UCM), 1972 L= option FIXED statement (LOAN), 885 _LABEL_ option COLUMNS statement (COMPUTAB), 474 ROWS statement (COMPUTAB), 476 LABEL statement DATASOURCE procedure, 590 MODEL procedure, 1045 LABEL= option ATTRIBUTE statement (DATASOURCE), 589 FIXED statement (LOAN), 887 LAG function, 104 LAG function MODEL procedure, 107 LAG statement PANEL procedure, 1323 LAGDEP option MODEL statement (AUTOREG), 354 MODEL statement (PDLREG), 1403 LAGDEP= option MODEL statement (AUTOREG), 354 MODEL statement (PDLREG), 1403 LAGDV option MODEL statement (AUTOREG), 354 MODEL statement (PDLREG), 1403 LAGDV= option MODEL statement (AUTOREG), 354 MODEL statement (PDLREG), 1403 LAGGED statement SIMLIN procedure, 1665 LAGMAX= option MODEL statement (VARMAX), 2088 PROC STATESPACE statement, 1730 LAGRANGE option
3128 F Syntax Index
TEST statement (ENTROPY), 694 TEST statement (MODEL), 1056 LAGS= option CORR statement (TIMESERIES), 1862 CROSSCORR statement (TIMESERIES), 1863 DEPLAG statement (UCM), 1954 LAMBDA= option DECOMP statement (TIMESERIES), 1864 LAMBDAHI= option BOXCOXAR macro, 155 LAMBDALO= option BOXCOXAR macro, 155 LCL= option OUTPUT statement (AUTOREG), 369 OUTPUT statement (PDLREG), 1405 LCLM= option OUTPUT statement (AUTOREG), 369 OUTPUT statement (PDLREG), 1405 LDW option MODEL statement (AUTOREG), 361 LEAD= option FORECAST statement (ARIMA), 242 FORECAST statement (UCM), 1958 FORECAST statement (X12), 2318 OUTPUT statement (VARMAX), 2101 PROC ESM statement, 730 PROC FORECAST statement, 835 PROC STATESPACE statement, 1733 LEFTTRUNCATED= option MODEL statement, 1515 LENGTH SSA statement (TIMESERIES), 1872 LENGTH option MONTHLY statement (X11), 2243 LENGTH statement DATASOURCE procedure, 590 LENGTH= option ATTRIBUTE statement (DATASOURCE), 589 SEASON statement (UCM), 1968 SPLINESEASON statement (UCM), 1972 LEVEL statement UCM procedure, 1963 LIBNAME libref SASECRSP statement, 2403 LIFE= option FIXED statement (LOAN), 885 LIKE option TEST statement (ENTROPY), 694 TEST statement (MODEL), 1056 LIMIT1= option MODEL statement (QLIM), 1438 LIML option PROC SYSLIN statement, 1784
LINK= option HETERO statement (AUTOREG), 363 LIST option FIT statement (MODEL), 1246 PROC MODEL statement, 1022, 1218 LISTALL option PROC MODEL statement, 1022 LISTCODE option PROC MODEL statement, 1022, 1220 LISTDEP option PROC MODEL statement, 1022, 1223 LISTDER option PROC MODEL statement, 1022 LJC option COLUMNS statement (COMPUTAB), 475 ROWS statement (COMPUTAB), 477 LJUNGBOXLIMIT= option AUTOMDL statement (X12), 2322 LM option TEST statement (ENTROPY), 694 TEST statement (MDC), 947 TEST statement (MODEL), 1056 TEST statement (PANEL), 1329 TEST statement (QLIM), 1441 LOAN procedure, 882 syntax, 882 LOG function, 1208 LOG10 function, 1208 LOG2 function, 1208 LOGLIKL option MODEL statement (AUTOREG), 354 LOGNORMALPARM= option MODEL statement (MDC), 938 LOGTEST macro, 160 macro variable, 161 LONG= option LIBNAME statement (SASEHAVR), 2560 LOWERBOUND= option ENDOGENOUS statement (QLIM), 1435 LR option TEST statement (ENTROPY), 694 TEST statement (MDC), 947 TEST statement (MODEL), 1056 TEST statement (PANEL), 1329 TEST statement (QLIM), 1441 LRECL= option PROC DATASOURCE statement, 583 LSCV= option OUTLIER statement (X12), 2325 LTEBOUND= option FIT statement (MODEL), 1042, 1196 MODEL statement (MODEL), 1196 SOLVE statement (MODEL), 1196
Syntax Index F 3129
M= option MODEL statement (PANEL), 1326 MODEL statement (TSCSREG), 1929 %MA macro, 1150, 1151 MA= option ESTIMATE statement (ARIMA), 238 MACURVES statement X11 procedure, 2240 MAPECR= option ARIMA statement (X11), 2238 MARGINAL OUTPUT statement (QLIM), 1439 MARGINALS option MODEL statement (ENTROPY), 691 MARKOV option ENTROPY procedure, 685 MARR= option COMPARE statement (LOAN), 893 MAXAD= option ARM statement (LOAN), 890 MAXADJUST= option ARM statement (LOAN), 890 MAXBAND= option MODEL statement (PANEL), 1326 MAXDIFF= option AUTOMDL statement (X12), 2321 MAXERROR= option PROC ESM statement, 730 PROC TIMEID statement, 1828 PROC TIMESERIES statement, 1857 MAXERRORS= option PROC FORECAST statement, 835 PROC MODEL statement, 1023 MAXIT= PROC SYSLIN statement, 1784 MAXIT= option ESTIMATE statement (ARIMA), 239 PROC STATESPACE statement, 1732 MAXITER= option ARIMA statement (X11), 2238 ENTROPY procedure, 688 ESTIMATE statement (ARIMA), 239 ESTIMATE statement (X12), 2317 FIT statement (MODEL), 1042 MODEL statement (AUTOREG), 361 MODEL statement (MDC), 941 MODEL statement (PANEL), 1326 MODEL statement (PDLREG), 1403 PROC SYSLIN statement, 1784 SOLVE statement (MODEL), 1054 MAXLAG= option CHECK statement (X12), 2315 IDENTIFY statement (X12), 2319 MAXNUM= option
OUTLIER statement (ARIMA), 240 OUTLIER statement (UCM), 1965 MAXORDER= option AUTOMDL statement (X12), 2320 MAXPCT= option OUTLIER statement (ARIMA), 240 OUTLIER statement (UCM), 1965 MAXR= option ARM statement (LOAN), 890 MAXRATE= option ARM statement (LOAN), 890 MAXSUBITER= option ENTROPY procedure, 688 FIT statement (MODEL), 1042, 1078 SOLVE statement (MODEL), 1054 MDC procedure, 932 syntax, 932 MDC procedure, MODEL statement ADDMAXIT= option, 939 ADDRANDOM option, 939 ADDVALUE option, 939 ALL option, 940 CHOICE= option, 937 CONVERGE= option, 937 CORRB option, 941 COVB option, 940 COVEST= option, 940 HALTONSTART= option, 937 HEV= option, 937 INITIAL= option, 941 ITPRINT option, 941 LOGNORMALPARM= option, 938 MAXITER= option, 941 MIXED= option, 937 NCHOICE option, 938 NOPRINT option, 941 NORMALEC= option, 938 NORMALPARM= option, 938 NSIMUL option, 938 OPTMETHOD= option, 941 RANDINIT option, 938 RANDNUM= option, 938 RANK option, 939 RESTART= option, 939 SAMESCALE option, 939 SEED= option, 940 SPSCALE option, 940 TYPE= option, 940 UNIFORMEC= option, 938 UNIFORMPARM= option, 938 UNITVARIANCE= option, 940 MDC procedure, OUTPUT statement OUT= option, 945 P= option, 946
3130 F Syntax Index
XBETA= option, 946 MDC procedure, PROC MDC statement COVOUT option, 934 DATA= option, 934 OUTEST= option, 934 MDC procedure, TEST statement, 947 MDCDATA statement, 934 MDLINFOIN= option PROC X12 statement, 2307 MDLINFOOUT= option PROC X12 statement, 2307 MDY function, 95 MDY function, 151 MEAN= option MODEL statement (AUTOREG), 350 MEASURE= option TARGET statement (SIMILARITY), 1607 MEDIAN option FORECAST statement (ESM), 734 MELO option PROC SYSLIN statement, 1784 MEMORYUSE option PROC MODEL statement, 1024 METHOD= option ARIMA statement (X11), 2238 CONVERT statement (EXPAND), 776, 783 ENTROPY procedure, 688 ESTIMATE statement (ARIMA), 236 FIT statement (MODEL), 1042, 1077 MODEL statement (AUTOREG), 362 MODEL statement (PDLREG), 1404 MODEL statement (VARMAX), 2086 PROC COUNTREG statement, 524 PROC EXPAND statement, 774, 783 PROC FORECAST statement, 835 QLIM procedure, 1432 MILLS OUTPUT statement (QLIM), 1439 MINIC option IDENTIFY statement (ARIMA), 232 PROC STATESPACE statement, 1731 MINIC= option MODEL statement (VARMAX), 2093 MINIC=(P=) option MODEL statement (VARMAX), 2093, 2132 MINIC=(PERROR=) option MODEL statement (VARMAX), 2093 MINIC=(Q=) option MODEL statement (VARMAX), 2093, 2132 MINIC=(TYPE=) option MODEL statement (VARMAX), 2093 MINR= option ARM statement (LOAN), 890
MINRATE= option ARM statement (LOAN), 890 MINTIMESTEP= option FIT statement (MODEL), 1042, 1196 MODEL statement (MODEL), 1196 SOLVE statement (MODEL), 1196 MINUTE function, 151 MISSING=option FIT statement (MODEL), 1037 MIXED option MODEL statement (MDC), 937 MODE= option DECOMP statement (TIMESERIES), 1864 X11 statement (X12), 2336 MODEL procedure, 1012 syntax, 1012 MODEL statement AUTOREG procedure, 348 COUNTREG procedure, 527 ENTROPY procedure, 691 MDC procedure, 936 PANEL procedure, 1323, 1324 PDLREG procedure, 1402 QLIM procedure, 1438 SEVERITY procedure, 1514 SYSLIN procedure, 1786 TSCSREG procedure, 1928 UCM procedure, 1964 VARMAX procedure, 2084 MODEL= option ARIMA statement (X11), 2238 ARIMA statement (X12), 2314 FORECAST statement (ESM), 734 PROC MODEL statement, 1021, 1217 MOMENT statement MODEL procedure, 1045 MONTH function, 95, 151 MONTHLY statement X11 procedure, 2241 MTITLE= option COLUMNS statement (COMPUTAB), 474 MU= option ESTIMATE statement (ARIMA), 238 +n option COLUMNS statement (COMPUTAB), 475 ROWS statement (COMPUTAB), 476 N2SLS option FIT statement (MODEL), 1036 N3SLS option FIT statement (MODEL), 1036 NAHEAD= option
Syntax Index F 3131
SOLVE statement (MODEL), 1052, 1166, 1167 _NAME_ option COLUMNS statement (COMPUTAB), 475 ROWS statement (COMPUTAB), 476 NBACKCAST= option FORECAST statement (ESM), 734 NBLOCKS= option BLOCKSEASON statement (UCM), 1951 NBYOBS= option PROC TIMEID statement, 1828 NCHOICE option MODEL statement (MDC), 938 NDEC= option MONTHLY statement (X11), 2243 PROC MODEL statement, 1023 QUARTERLY statement (X11), 2248 SSPAN statement (X11), 2249 NDRAW option FIT statement (MODEL), 1036 NDRAW= option QLIM procedure, 1432 NEST statement MDC procedure, 942 NESTIT option FIT statement (MODEL), 1042, 1077 NEWTON option SOLVE statement (MODEL), 1052 NKNOTS= option SPLINEREG statement (UCM), 1971 NLAG= option CORR statement (TIMESERIES), 1862 CROSSCORR statement (TIMESERIES), 1863 IDENTIFY statement (ARIMA), 233 MODEL statement (AUTOREG), 348 MODEL statement (PDLREG), 1404 NLAGS= option PROC FORECAST statement, 834 NLAMBDA= option BOXCOXAR macro, 155 NLOPTIONS statement AUTOREG procedure, 364 COUNTREG procedure, 528 MDC procedure, 945 QLIM procedure, 1439 SEVERITY procedure, 1518 UCM procedure, 1964 VARMAX procedure, 2100, 2144 NO2SLS option FIT statement (MODEL), 1036 NO3SLS option FIT statement (MODEL), 1036 NOCENTER option
PROC STATESPACE statement, 1731 NOCOMPRINT option COMPARE statement (LOAN), 894 NOCONST option HETERO statement (AUTOREG), 364 NOCONSTANT option ESTIMATE statement (ARIMA), 236 NOCURRENTX option MODEL statement (VARMAX), 2086 NODF option ESTIMATE statement (ARIMA), 236 NODIFFS option MODEL statement (PANEL), 1326 NOEST option AUTOREG statement (UCM), 1949 BLOCKSEASON statement (UCM), 1951 CYCLE statement (UCM), 1953 DEPLAG statement (UCM), 1954 ESTIMATE statement (ARIMA), 238 IRREGULAR statement (UCM), 1960, 1962 LEVEL statement (UCM), 1964 PROC STATESPACE statement, 1732 RANDOMREG statement (UCM), 1966 SEASON statement (UCM), 1968 SLOPE statement (UCM), 1969 SPLINEREG statement (UCM), 1971 SPLINESEASON statement (UCM), 1972 NOESTIM option MODEL statement (PANEL), 1326 NOGENGMMV option FIT statement (MODEL), 1036 NOINCLUDE option PROC SYSLIN statement, 1784 NOINT option ARIMA statement (X11), 2239 AUTOMODL statement (X12), 2321 ESTIMATE statement (ARIMA), 236 INSTRUMENTS statement (MODEL), 1045 MODEL statement (AUTOREG), 348, 350 MODEL statement (COUNTREG), 527 MODEL statement (ENTROPY), 691 MODEL statement (PANEL), 1326 MODEL statement (PDLREG), 1404 MODEL statement (QLIM), 1438 MODEL statement (SYSLIN), 1787 MODEL statement (TSCSREG), 1929 MODEL statement (VARMAX), 2087 NOINTERCEPT option INSTRUMENTS statement (MODEL), 1045 NOLEVELS option MODEL statement (PANEL), 1326 NOLS option ESTIMATE statement (ARIMA), 239 NOMEAN option
3132 F Syntax Index
MODEL statement (TSCSREG), 1929 NOMISS option IDENTIFY statement (ARIMA), 233 MODEL statement (AUTOREG), 362 NONORMALIZE option WEIGHT statement (COUNTREG), 530 WEIGHT statement (QLIM), 1443 NOOLS option FIT statement (MODEL), 1036 NOOUTALL option FORECAST statement (ARIMA), 242 PROC ESM statement, 730 NOP option FIXED statement (LOAN), 888 NOPRINT option ARIMA statement (X11), 2239 COLUMNS statement (COMPUTAB), 475 ESTIMATE statement (ARIMA), 236 FIXED statement (LOAN), 888 FORECAST statement (ARIMA), 242 IDENTIFY statement (ARIMA), 233 MODEL statement (AUTOREG), 354 MODEL statement (MDC), 941 MODEL statement (PANEL), 1326 MODEL statement (PDLREG), 1404 MODEL statement (SYSLIN), 1787 MODEL statement (TSCSREG), 1929 MODEL statement (VARMAX), 2088 OUTPUT statement (VARMAX), 2101 PROC COMPUTAB statement, 473 PROC COUNTREG statement, 524 PROC ENTROPY statement, 686 PROC MODEL statement, 1023 PROC QLIM statement, 1431 PROC SEVERITY statement, 1512 PROC SIMLIN statement, 1663 PROC STATESPACE statement, 1730 PROC SYSLIN statement, 1785 PROC UCM statement, 1946 PROC VARMAX statement (VARMAX), 2179 PROC X11 statement, 2236 ROWS statement (COMPUTAB), 477 SSPAN statement (X11), 2249 NOPROFILE ESTIMATE statement (UCM), 1955 NORED option PROC SIMLIN statement, 1663 NORMAL option ERRORMODEL statement (MODEL), 1030 FIT statement (MODEL), 1041 MODEL statement (AUTOREG), 354 NORMALEC= option MODEL statement (MDC), 938
NORMALIZE= option COINTEG statement (VARMAX), 2083, 2193 INPUT statement (SIMILARITY), 1603 TARGET statement (SIMILARITY), 1608 NORMALPARM= option MODEL statement (MDC), 938 NORTR option PROC COMPUTAB statement, 473 NOSTABLE option ESTIMATE statement (ARIMA), 239 NOSTORE option PROC MODEL statement, 1021 NOSUM TABLES statement (X12), 2332 NOSUMMARYPRINT option FIXED statement (LOAN), 888 NOSUMPR option FIXED statement (LOAN), 888 NOTFSTABLE option ESTIMATE statement (ARIMA), 239 NOTRANS option PROC COMPUTAB statement, 473 NOTRANSPOSE option PROC COMPUTAB statement, 472 NOTSORTED option ID statement (ESM), 737 ID statement (SIMILARITY), 1601 ID statement (TIMESERIES), 1867 TIMEID procedure, 1830 NOZERO option COLUMNS statement (COMPUTAB), 475 ROWS statement (COMPUTAB), 477 NPARMS= option CORR statement (TIMESERIES), 1862 NPERIODS option SSA statement (TIMESERIES), 1872 NPERIODS= option DECOMP statement (TIMESERIES), 1864 TREND statement (TIMESERIES), 1874 NPREOBS option FIT statement (MODEL), 1037 NSEASON= option MODEL statement (VARMAX), 2087 NSIMUL option MODEL statement (MDC), 938 NSSTART= MAX option PROC FORECAST statement, 836 NSSTART= option PROC FORECAST statement, 836 NSTART= MAX option PROC FORECAST statement, 836 NSTART= option PROC FORECAST statement, 836
Syntax Index F 3133
NVDRAW option FIT statement (MODEL), 1037 NWKDOM function, 151 OBSERVED= option CONVERT statement (EXPAND), 777, 781 PROC EXPAND statement, 774 OFFSET= option BLOCKSEASON statement (UCM), 1951 MODEL statement (COUNTREG), 527 SPLINESEASON statement (UCM), 1972 OL option ROWS statement (COMPUTAB), 477 OLS option FIT statement (MODEL), 1037, 1229 PROC SYSLIN statement, 1784 ONEPASS option SOLVE statement (MODEL), 1053 OPTIONS option PROC COMPUTAB statement, 473 OPTMETHOD= option MODEL statement (AUTOREG), 362 MODEL statement (MDC), 941 ORDER= option ENDOGENOUS statement (QLIM), 1434 PROC SIMILARITY statement, 1596 OTHERWISE, 1215 OUT = option FlatData statement (PANEL), 1321 OUT1STEP option PROC FORECAST statement, 837 OUT= option BOXCOXAR macro, 155 DFTEST macro, 159 ENTROPY procedure, 686 FIT statement (MODEL), 1037, 1160, 1231 FIXED statement (LOAN), 888, 897 FORECAST command (TSFS), 2777 FORECAST statement (ARIMA), 242, 265 LOGTEST macro, 161 OUTPUT statement (AUTOREG), 367 OUTPUT statement (COUNTREG), 528 OUTPUT statement (MDC), 945 OUTPUT statement (PANEL), 1328 OUTPUT statement (PDLREG), 1405 OUTPUT statement (QLIM), 1440 OUTPUT statement (SIMLIN), 1666 OUTPUT statement (SYSLIN), 1803 OUTPUT statement (VARMAX), 2101, 2178 OUTPUT statement (X11), 2245, 2265 OUTPUT statement (X12), 2323 PROC ARIMA statement, 230 PROC COMPUTAB statement, 473
PROC DATASOURCE statement, 583, 592 PROC ESM statement, 730 PROC EXPAND statement, 773, 801 PROC FORECAST statement, 836, 850 PROC SIMILARITY statement, 1596 PROC SIMLIN statement, 1671 PROC SPECTRA statement, 1694, 1700 PROC STATESPACE statement, 1733, 1749 PROC SYSLIN statement, 1783 PROC TIMESERIES statement, 1858 SOLVE statement (MODEL), 1050, 1170, 1199 TEST statement (ENTROPY), 694 TEST statement (MODEL), 1056 OUTACTUAL option FIT statement (MODEL), 1038 PROC FORECAST statement, 836 SOLVE statement (MODEL), 1050 OUTALL option FIT statement (MODEL), 1038 PROC FORECAST statement, 836 SOLVE statement (MODEL), 1051 OUTALL= option PROC DATASOURCE statement, 584, 596 OUTAR= option PROC STATESPACE statement, 1731, 1749 OUTBY= option PROC DATASOURCE statement, 584, 595 OUTCDF= option PROC SEVERITY statement, 1512 OUTCOMP= option COMPARE statement (LOAN), 894, 898 OUTCONT= option PROC DATASOURCE statement, 584, 594 OUTCORR option ESTIMATE statement (ARIMA), 237 PROC PANEL statement, 1318 PROC TSCSREG statement, 1926 OUTCORR= option PROC TIMESERIES statement, 1858 OUTCOV option ENTROPY procedure, 686 ESTIMATE statement (ARIMA), 237 ESTIMATE statement (MODEL), 1032 FIT statement (MODEL), 1038 PROC PANEL statement, 1318 PROC SYSLIN statement, 1783 PROC TSCSREG statement, 1926 PROC VARMAX statement, 2078, 2180 OUTCOV3 option PROC SYSLIN statement, 1783 OUTCOV= option IDENTIFY statement (ARIMA), 233, 267 OUTCROSSCORR= option
3134 F Syntax Index
PROC TIMESERIES statement, 1858 OUTDECOMP= option PROC TIMESERIES statement, 1858 OUTERRORS option SOLVE statement (MODEL), 1051 OUTEST= option ENTROPY procedure, 686 ENTROPY statement, 708 ESTIMATE statement (ARIMA), 237, 267 ESTIMATE statement (MODEL), 1032 ESTIMATE statement (UCM), 1956 FIT statement (MODEL), 1038, 1161 PROC AUTOREG statement, 346 PROC COUNTREG statement, 524 PROC ESM statement, 731 PROC EXPAND statement, 773, 801 PROC FORECAST statement, 837, 852 PROC MDC statement, 934 PROC PANEL statement, 1318, 1368 PROC QLIM statement, 1431 PROC SEVERITY statement, 1511 PROC SIMLIN statement, 1664, 1670 PROC SYSLIN statement, 1783, 1803 PROC TSCSREG statement, 1926 PROC VARMAX statement, 2077, 2180 OUTESTALL option PROC FORECAST statement, 837 OUTESTTHEIL option PROC FORECAST statement, 837 OUTEVENT= option PROC DATASOURCE statement, 584, 597 OUTEXTRAP option PROC X11 statement, 2236 OUTFITSTATS option PROC FORECAST statement, 837 OUTFOR= option FORECAST statement (UCM), 1958 PROC ESM statement, 731 OUTFORECAST option X11 statement (X12), 2336 OUTFULL option PROC FORECAST statement, 837 OUTHT= option GARCH statement, 2099 PROC VARMAX statement, 2182 OUTINTERVAL= option PROC TIMEID statement, 1828 OUTINTERVALDETAILS= option PROC TIMEID statement, 1828 OUTL= option ENTROPY procedure, 686 ENTROPY statement, 709 OUTLAGS option FIT statement (MODEL), 1038
SOLVE statement (MODEL), 1051 OUTLIER statement UCM procedure, 1965 X12 procedure, 2324 OUTLIMIT option PROC FORECAST statement, 837 OUTMEASURE= option PROC SIMILARITY statement, 1596 OUTMODEL= option ESTIMATE statement (ARIMA), 237, 270 PROC MODEL statement, 1021, 1216 PROC STATESPACE statement, 1732, 1750 OUTMODELINFO= option PROC SEVERITY statement, 1512 OUTP= option ENTROPY procedure, 686 ENTROPY statement, 708 OUTPARMS= option FIT statement (MODEL), 1162 PROC MODEL statement, 1020, 1157 OUTPATH= option PROC SIMILARITY statement, 1596 OUTPREDICT option FIT statement (MODEL), 1038 SOLVE statement (MODEL), 1051 OUTPROCINFO= option PROC ESM statement, 731 OUTPUT OUT=, 410 OUTPUT statement AUTOREG procedure, 367 COUNTREG procedure, 528 PANEL procedure, 1328 PDLREG procedure, 1404 PROC PANEL statement, 1368 QLIM procedure, 1439 SIMLIN procedure, 1666 SYSLIN procedure, 1788 VARMAX procedure, 2101 X11 procedure, 2245 X12 procedure, 2323 OUTRESID option FIT statement (MODEL), 1038, 1231 PROC FORECAST statement, 837 SOLVE statement (MODEL), 1051 OUTS= option ENTROPY procedure, 686 FIT statement (MODEL), 1038, 1076, 1162 OUTSEASON= option PROC TIMESERIES statement, 1858 OUTSELECT= option PROC DATASOURCE statement, 584 OUTSELECT=OFF option LIBNAME statement (SASEHAVR), 2561
Syntax Index F 3135
OUTSELECT=ON option LIBNAME statement (SASEHAVR), 2561 OUTSEQUENCE= option PROC SIMILARITY statement, 1597 OUTSN= option FIT statement (MODEL), 1038 OUTSPAN= option PROC X11 statement, 2237, 2265 VAR statement (X11), 2265 OUTSPECTRA= option PROC TIMESERIES statement, 1858 OUTSSA= option PROC TIMESERIES statement, 1858 OUTSSCP= option PROC SYSLIN statement, 1783, 1804 OUTSTAT= option DFTEST macro, 159 ESTIMATE statement (ARIMA), 237, 272 PROC ESM statement, 731 PROC SEVERITY statement, 1511 PROC VARMAX statement, 2078, 2183 PROC X12 statement, 2308 OUTSTB= option PROC X11 statement, 2237, 2265 OUTSTD option PROC FORECAST statement, 837 OUTSUM= option FIXED statement (LOAN), 888 PROC ESM statement, 731 PROC LOAN statement, 885, 898 PROC SIMILARITY statement, 1597 PROC TIMESERIES statement, 1858 OUTSUSED= option ENTROPY procedure, 686 FIT statement (MODEL), 1038, 1076, 1162 OUTTDR= option PROC X11 statement, 2237, 2266 OUTTRANS= option PROC PANEL statement, 1370 OUTTRANS=option PROC PANEL statement, 1318 OUTTREND= option PROC TIMESERIES statement, 1858 OUTUNWGTRESID option FIT statement (MODEL), 1038 OUTV= option FIT statement (MODEL), 1039, 1159, 1163 OUTVARS statement MODEL procedure, 1047 OVDIFCR= option ARIMA statement (X11), 2239 OVERID option MODEL statement (SYSLIN), 1787 OVERPRINT option
ROWS statement (COMPUTAB), 477 P option IRREGULAR statement (UCM), 1962 PROC SPECTRA statement, 1694 P= option ESTIMATE statement (ARIMA), 236 FIXED statement (LOAN), 886 GARCH statement, 2100 IDENTIFY statement (ARIMA), 233 MODEL statement (AUTOREG), 349 MODEL statement (VARMAX), 2092 OUTPUT statement (AUTOREG), 369 OUTPUT statement (MDC), 946 OUTPUT statement (PANEL), 1328 OUTPUT statement (PDLREG), 1405 OUTPUT statement (SIMLIN), 1666 _PAGE_ option COLUMNS statement (COMPUTAB), 475 ROWS statement (COMPUTAB), 477 PANEL procedure, 1316 syntax, 1316 PARAMETERS statement MODEL procedure, 1047, 1200 PARKS option MODEL statement (PANEL), 1326 MODEL statement (TSCSREG), 1929 PARMS= option FIT statement (MODEL), 1034 PARMSDATA= option PROC MODEL statement, 1020, 1157 SOLVE statement (MODEL), 1051 PARMTOL= option PROC STATESPACE statement, 1733 PARTIAL option MODEL statement (AUTOREG), 354 MODEL statement (PDLREG), 1404 PASTMIN= option PROC STATESPACE statement, 1732 PATH= option TARGET statement (SIMILARITY), 1608 PAYMENT= option FIXED statement (LOAN), 886 PCHOW= option FIT statement (MODEL), 1041, 1131 MODEL statement (AUTOREG), 354 PDATA= option ENTROPY procedure, 686 ENTROPY statement, 707 %PDL macro, 1153 PDLREG procedure, 1399 syntax, 1399 PDWEIGHTS statement X11 procedure, 2246
3136 F Syntax Index
PERIOD= option CYCLE statement (UCM), 1953 PERIODOGRAM option PROC X12 statement, 2307 PERMCO= option LIBNAME statement (SASECRSP), 2407 PERMNO= option LIBNAME statement (SASECRSP), 2404 PERROR= option IDENTIFY statement (ARIMA), 233 PH option PROC SPECTRA statement, 1694 PHI option MODEL statement (PANEL), 1327 MODEL statement (TSCSREG), 1929 PHI= option DEPLAG statement (UCM), 1954 PLOT HAXIS=, 87 PLOT option AUTOREG statement (UCM), 1950 BLOCKSEASON statement (UCM), 1951 CYCLE statement (UCM), 1953 ESTIMATE statement (ARIMA), 237 ESTIMATE statement (UCM), 1956 FORECAST statement (UCM), 1958 IRREGULAR statement (UCM), 1960 MODEL statement (SYSLIN), 1787 PROC ARIMA statement, 228 PROC UCM statement, 1946 PROC X12 statement, 2308 RANDOMREG statement (UCM), 1966 SEASON statement (UCM), 1968 SLOPE statement (UCM), 1970 SPLINEREG statement (UCM), 1971 SPLINESEASON statement (UCM), 1972 PLOT= option PROC ESM statement, 731 PROC TIMEID statement, 1828 PLOTS option PROC ARIMA statement, 228 PROC AUTOREG statement, 346 PROC ENTROPY statement, 687 PROC MODEL statement, 1020 PROC PANEL statement, 1318 PROC UCM statement, 1946 PROC X12 statement, 2308 PLOTS= option PROC EXPAND statement, 774 PROC SEVERITY statement, 1513 PROC SIMILARITY statement, 1597 PROC TIMESERIES statement, 1858 PM= option OUTPUT statement (AUTOREG), 369
OUTPUT statement (PDLREG), 1405 PMFACTOR= option MONTHLY statement (X11), 2243 PNT= option FIXED statement (LOAN), 887 PNTPCT= option FIXED statement (LOAN), 887 POINTPCT= option FIXED statement (LOAN), 887 POINTS= option FIXED statement (LOAN), 887 POISSON option ERRORMODEL statement (MODEL), 1030 POOLED option MODEL statement (PANEL), 1327 POWER= option TRANSFORM statement (X12), 2332 PRC= option FIXED statement (LOAN), 888 pred OUTPUT statement (COUNTREG), 528 PREDEFINED= option ADJUST statement (X12), 2314 PREDEFINED statement (X12), 2327 PREDICTED OUTPUT statement (QLIM), 1440 PREDICTED= option OUTPUT statement (AUTOREG), 369 OUTPUT statement (PANEL), 1328 OUTPUT statement (PDLREG), 1405 OUTPUT statement (SIMLIN), 1666 OUTPUT statement (SYSLIN), 1788 PREDICTEDM= option OUTPUT statement (AUTOREG), 369 OUTPUT statement (PDLREG), 1405 PREPAYMENTS= option FIXED statement (LOAN), 887 PRICE= option FIXED statement (LOAN), 888 PRIMAL option ENTROPY procedure, 687 PRINT option AUTOREG statement (UCM), 1950 BLOCKSEASON statement (UCM), 1952 CYCLE statement (UCM), 1953 ESTIMATE statement (UCM), 1957 FORECAST statement (UCM), 1959 IRREGULAR statement (UCM), 1960 LEVEL statement (UCM), 1964 OUTLIER statement (UCM), 1965 PROC STATESPACE statement, 1733 RANDOMREG statement (UCM), 1966 SEASON statement (UCM), 1969 SLOPE statement (UCM), 1970
Syntax Index F 3137
SPLINESEASON statement (UCM), 1972 SSPAN statement (X11), 2249 STEST statement (SYSLIN), 1792 TEST statement (SYSLIN), 1794 PRINT= option AUTOMDL statement (X12), 2321 BOXCOXAR macro, 156 CHECK statement (X12), 2316 LOGTEST macro, 161 MODEL statement (VARMAX), 2089 PROC SEVERITY statement, 1512 PROC SIMILARITY statement, 1598 PROC TIMEID statement, 1829 PROC TIMESERIES statement, 1859 PRINT=(CORRB) option MODEL statement (VARMAX), 2089 PRINT=(CORRX) option MODEL statement (VARMAX), 2089 PRINT=(CORRY) option MODEL statement (VARMAX), 2089, 2128 PRINT=(COVB) option MODEL statement (VARMAX), 2089 PRINT=(COVPE) option MODEL statement (VARMAX), 2089, 2123 PRINT=(COVX) option MODEL statement (VARMAX), 2089 PRINT=(COVY) option MODEL statement (VARMAX), 2089 PRINT=(DECOMPOSE) option MODEL statement (VARMAX), 2089, 2125 PRINT=(DIAGNOSE) option MODEL statement (VARMAX), 2090 PRINT=(DYNAMIC) option MODEL statement (VARMAX), 2090, 2109 PRINT=(ESTIMATES) option MODEL statement (VARMAX), 2090 PRINT=(IARR) option MODEL statement (VARMAX), 2063, 2090 PRINT=(IMPULSE) option MODEL statement (VARMAX), 2116 PRINT=(IMPULSE=) option MODEL statement (VARMAX), 2090 PRINT=(IMPULSX) option MODEL statement (VARMAX), 2112 PRINT=(IMPULSX=) option MODEL statement (VARMAX), 2091 PRINT=(PARCOEF) option MODEL statement (VARMAX), 2091, 2129 PRINT=(PCANCORR) option MODEL statement (VARMAX), 2091, 2131 PRINT=(PCORR) option MODEL statement (VARMAX), 2091, 2130 PRINT=(ROOTS) option MODEL statement (VARMAX), 2091, 2134
PRINT=(YW) option MODEL statement (VARMAX), 2092 PRINT=option PROC ESM statement, 732 PRINTALL option ARIMA statement (X11), 2239 ESTIMATE statement (ARIMA), 239 FIT statement (MODEL), 1041 FORECAST statement (ARIMA), 242 MODEL statement, 528 MODEL statement (VARMAX), 2088 PROC MODEL statement, 1023 PROC QLIM statement, 1431 PROC UCM statement, 1949 SOLVE statement (MODEL), 1054 SSPAN statement (X11), 2250 PRINTDETAILS option PROC ESM statement, 732 PROC SIMILARITY statement, 1598 PROC TIMESERIES statement, 1860 PRINTERR option ESTIMATE statement (X12), 2317 PRINTFORM= option MODEL statement (VARMAX), 2088, 2112 PRINTFP option ARIMA statement (X11), 2239 PRINTOUT= option MONTHLY statement (X11), 2243 PROC STATESPACE statement, 1731 QUARTERLY statement (X11), 2248 PRINTREG option IDENTIFY statement (X12), 2319 PRIOR option MODEL statement (VARMAX), 2096 PRIOR=(IVAR) option MODEL statement (VARMAX), 2097 PRIOR=(LAMBDA=) option MODEL statement (VARMAX), 2097 PRIOR=(MEAN=) option MODEL statement (VARMAX), 2097 PRIOR=(NREP=) option MODEL statement (VARMAX), 2098 PRIOR=(THETA=) option MODEL statement (VARMAX), 2098 PRIORS statement ENTROPY procedure, 692 PRL= option FIT statement (MODEL), 1034, 1133 PROB OUTPUT statement (COUNTREG), 528 OUTPUT statement (QLIM), 1440 PROBALL OUTPUT statement (QLIM), 1440 PROBCOUNT
3138 F Syntax Index
OUTPUT statement (COUNTREG), 529 PROBDF function, 162 macro, 162 PROBOBSERVED= option MODEL statement, 1515 PROBZERO OUTPUT statement (COUNTREG), 529 PROC ARIMA statement, 227 PROC AUTOREG OUTEST=, 410 PROC AUTOREG statement, 346 PROC COMPUTAB NOTRANS, 481 OUT=, 491 PROC COMPUTAB statement, 472 PROC DATASOURCE statement, 581 PROC ENTROPY statement, 685 PROC ESM statement, 730 PROC EXPAND statement, 773 PROC FORECAST statement, 834 PROC LOAN statement, 884 PROC MDC statement, 934 PROC MODEL statement, 1020 PROC PANEL statement, 1318 PROC PDLREG statement, 1401 PROC SEVERITY statement, 1511 PROC SIMILARITY statement, 1596 PROC SIMLIN statement, 1663 PROC SPECTRA statement, 1693 PROC STATESPACE statement, 1730 PROC SYSLIN statement, 1782 PROC TIMEID statement, 1828 PROC TIMESERIES statement, 1857 PROC TSCSREG statement, 1926 PROC UCM statement, 1946, see UCM procedure PROC VARMAX statement, 2077 PROC X11 statement, 2236 PROC X12 statement, 2305 PRODUCTION option ENDOGENOUS statement (QLIM), 1436 PROFILE ESTIMATE statement (UCM), 1957 PROJECT= option FORECAST command (TSFS), 2775 PSEUDO= option SOLVE statement (MODEL), 1053 PURE option ENTROPY procedure, 685 PURGE option RESET statement (MODEL), 1049 PUT, 1214 PWC option COMPARE statement (LOAN), 893
PWOFCOST option COMPARE statement (LOAN), 893 Q option IRREGULAR statement (UCM), 1962 Q= option ESTIMATE statement (ARIMA), 237 GARCH statement, 2100 IDENTIFY statement (ARIMA), 233 MODEL statement (AUTOREG), 349 MODEL statement (VARMAX), 2092, 2144 QLIM procedure, 1428 syntax, 1428 QLIM procedure, CLASS statement, 1433 QLIM procedure, FREQ statement, 1436 QLIM procedure, TEST statement, 1441 QLIM procedure, WEIGHT statement, 1442 QTR function, 151 QUARTERLY statement X11 procedure, 2246 QUASI= option SOLVE statement (MODEL), 1053 QUIET= option FCMPOPT statement (SIMILARITY), 1599 R= option FIXED statement (LOAN), 886 OUTPUT statement (AUTOREG), 369 OUTPUT statement (PANEL), 1328 OUTPUT statement (PDLREG), 1405 OUTPUT statement (SIMLIN), 1666 RANDINIT option MODEL statement (MDC), 938 RANDNUM= option MODEL statement (MDC), 938 RANDOM= option SOLVE statement (MODEL), 1053, 1170, 1185 RANDOMREG UCM procedure, 1966 RANGE, 1203 RANGE option MODEL statement (ENTROPY), 692 RANGE statement DATASOURCE procedure, 588 MODEL procedure, 1048 RANGE= option LIBNAME statement (SASECRSP), 2409 LIBNAME statement (SASEFAME), 2505 RANK option MODEL statement (MDC), 939 RANK= option
Syntax Index F 3139
COINTEG statement (VARMAX), 2084, 2164 RANONE option MODEL statement (PANEL), 1327 MODEL statement (TSCSREG), 1929 RANTWO option MODEL statement (PANEL), 1327 MODEL statement (TSCSREG), 1929 RAO option TEST statement (ENTROPY), 694 TEST statement (MODEL), 1056 RATE= option FIXED statement (LOAN), 886 RECFM= option PROC DATASOURCE statement, 583 RECPEV= option OUTPUT statement (AUTOREG), 369 RECRES= option OUTPUT statement (AUTOREG), 369 REDUCECV= option AUTOMDL statement (X12), 2322 REDUCED option PROC SYSLIN statement, 1785 REEVAL option FORECAST command (TSFS), 2777 REFIT option FORECAST command (TSFS), 2777 REGRESSION statement X12 procedure, 2326 Remote Fame data access physical name using #port number, 2503 RENAME statement DATASOURCE procedure, 591 REPLACEBACK option FORECAST statement (ESM), 734 REPLACEMISSING option FORECAST statement (ESM), 734 RESET option MODEL statement (AUTOREG), 354 RESET statement MODEL procedure, 1049 RESIDDATA= option SOLVE statement (MODEL), 1051 RESIDEST option PROC STATESPACE statement, 1733 RESIDUAL OUTPUT statement (QLIM), 1440 RESIDUAL= option OUTPUT statement (AUTOREG), 369 OUTPUT statement (PANEL), 1328 OUTPUT statement (PDLREG), 1405 OUTPUT statement (SIMLIN), 1666 OUTPUT statement (SYSLIN), 1788 RESIDUALM= option
OUTPUT statement (AUTOREG), 369 OUTPUT statement (PDLREG), 1405 RESTART option MODEL statement (MDC), 939 RESTRICT statement AUTOREG procedure, 364 COUNTREG procedure, 529 ENTROPY procedure, 692 MDC procedure, 946 MODEL procedure, 1049 PDLREG procedure, 1406 QLIM procedure, 1440 STATESPACE procedure, 1735 SYSLIN procedure, 1789 VARMAX procedure, 2102 RETAIN statement MODEL procedure, 1216 RHO option MODEL statement (PANEL), 1327 MODEL statement (TSCSREG), 1929 RHO= option AUTOREG statement (UCM), 1950 CYCLE statement (UCM), 1953 RIGHTCENSORED= option MODEL statement, 1515 RKNOTS option SPLINESEASON statement (UCM), 1972 RM= option OUTPUT statement (AUTOREG), 369 OUTPUT statement (PDLREG), 1405 ROBUST option MODEL statement (PANEL), 1327 ROUND= NONE option FIXED statement (LOAN), 888 ROUND= option FIXED statement (LOAN), 888 ‘row titles’ option ROWS statement (COMPUTAB), 476 ROWS statement COMPUTAB procedure, 475 RSLB= option MODEL statement, 1517 RTS= option PROC COMPUTAB statement, 473 RUNS option MODEL statement (AUTOREG), 355 S option IRREGULAR statement (UCM), 1962 PROC SPECTRA statement, 1694 SAMESCALE option MODEL statement (MDC), 939 SAR option IRREGULAR statement (UCM), 1962
3140 F Syntax Index
SATISFY= option SOLVE statement (MODEL), 1050 SCALE= option INPUT statement (SIMILARITY), 1603 SCAN option IDENTIFY statement (ARIMA), 233 SCENTER option MODEL statement (VARMAX), 2087 SCHEDULE option FIXED statement (LOAN), 889 SCHEDULE= option FIXED statement (LOAN), 889 SCHEDULE= YEARLY option FIXED statement (LOAN), 889 SDATA= option ENTROPY procedure, 686, 707 FIT statement (MODEL), 1039, 1157, 1236 SOLVE statement (MODEL), 1051, 1170, 1198 SDIAG option PROC SYSLIN statement, 1784 SDIF= option INPUT statement (SIMILARITY), 1603 TARGET statement (SIMILARITY), 1608 VAR statement (TIMESERIES), 1875 SDIFF= option IDENTIFY statement (X12), 2319 SEASON statement TIMESERIES procedure, 1868 UCM procedure, 1966 SEASONALITY= option PROC ESM statement, 732 PROC SIMILARITY statement, 1598 PROC TIMESERIES statement, 1860 SEASONALMA= option X11 statement (X12), 2336 SEASONS= option PROC FORECAST statement, 837 PROC X12 statement, 2306 SECOND function, 151 SEED= option MODEL statement (MDC), 940 QLIM procedure, 1432 SOLVE statement (MODEL), 1053, 1170 SEIDEL option SOLVE statement (MODEL), 1053 SELECT, 1215 SELECT option ENDOGENOUS statement (QLIM), 1436 SETID= option LIBNAME statement (SASECRSP), 2403 SETMISSING= option FORECAST statement (ESM), 734
ID statement (ESM), 737 ID statement (SIMILARITY), 1601 ID statement (TIMESERIES), 1867 INPUT statement (SIMILARITY), 1603 TARGET statement (SIMILARITY), 1609 VAR statement (TIMESERIES), 1875 SEVERITY procedure, 1509 syntax, 1509 SHORT= option LIBNAME statement (SASEHAVR), 2560 SICCD= option LIBNAME statement (SASECRSP), 2408 SIGCORR= option PROC STATESPACE statement, 1732 SIGMA= option OUTLIER statement (ARIMA), 240 SIGMALIM= option X11 statement (X12), 2337 SIGSQ= option FORECAST statement (ARIMA), 242 SIMILARITY procedure, 1594 syntax, 1594 SIMLIN procedure, 1662 syntax, 1662 SIMPLE option PROC SYSLIN statement, 1785 SIMULATE option SOLVE statement (MODEL), 1052 SIN function, 1208 SINGLE option SOLVE statement (MODEL), 1053, 1190 SINGULAR= option ESTIMATE statement (ARIMA), 239 FIT statement (MODEL), 1042 MODEL statement (TSCSREG), 1929 PROC FORECAST statement, 837 PROC STATESPACE statement, 1733 PROC SYSLIN statement, 1784 SINGULAR=option MODEL statement (PANEL), 1327 SINH function, 1208 SINTPER= option PROC FORECAST statement, 838 SKIP option ROWS statement (COMPUTAB), 477 SKIPFIRST= option ESTIMATE statement (UCM), 1957 FORECAST statement (UCM), 1959 SKIPLAST= option ESTIMATE statement (UCM), 1955 SLAG option LAG statement (PANEL), 1323 SLAG statement LAG statement (PANEL), 1323
Syntax Index F 3141
SLENTRY= option PROC FORECAST statement, 838 SLIDE= option TARGET statement (SIMILARITY), 1609 SLOPE statement UCM procedure, 1969 SLSTAY= option MODEL statement (AUTOREG), 360 PROC FORECAST statement, 838 SMA option IRREGULAR statement (UCM), 1963 SOLVE statement MODEL procedure, 1050 SOLVEPRINT option SOLVE statement (MODEL), 1054 SORTNAMES option PROC ESM statement, 732 PROC SIMILARITY statement, 1598 PROC TIMESERIES statement, 1860 SOURCE= option LIBNAME statement (SASEHAVR), 2560 SP option IRREGULAR statement (UCM), 1963 SPAN= option OUTLIER statement (X12), 2324 PROC X12 statement, 2306 SPECTRA procedure, 1692 syntax, 1692 SPECTRA statement TIMESERIES procedure, 1869 SPECTRUMSERIES= option PROC X12 statement, 2307 SPLINEREG UCM procedure, 1970 SPLINESEASON UCM procedure, 1971 SPSCALE option MODEL statement (MDC), 940 SQ option IRREGULAR statement (UCM), 1963 SQRT function, 1208 SRESTRICT statement SYSLIN procedure, 1790 SSA statement TIMESERIES procedure, 1871 SSPAN statement X11 procedure, 2249 START= option FIT statement (MODEL), 1034, 1084, 1229 FIXED statement (LOAN), 888 ID statement (ESM), 738 ID statement (SIMILARITY), 1602 ID statement (TIMESERIES), 1868 LIBNAME statement (SASEHAVR), 2559
MODEL statement (AUTOREG), 361 MONTHLY statement (X11), 2243 PROC FORECAST statement, 838 PROC SIMLIN statement, 1664 PROC X12 statement, 2306 QUARTERLY statement (X11), 2248 SOLVE statement (MODEL), 1052 STARTITER option FIT statement (MODEL), 1085 STARTITER= option FIT statement (MODEL), 1043 STARTSUM= option PROC ESM statement, 732 STARTUP= option MODEL statement (AUTOREG), 350 STAT= option FORECAST command (TSFS), 2776 STATESPACE procedure, 1728 syntax, 1728 STATIC option FIT statement (MODEL), 1120 SOLVE statement (MODEL), 1052, 1118, 1166 STATIONARITY= option IDENTIFY statement (ARIMA), 234 MODEL statement (AUTOREG), 355 STATS option SOLVE statement (MODEL), 1054, 1184 STB option MODEL statement (PDLREG), 1404 MODEL statement (SYSLIN), 1787 STD= option HETERO statement (AUTOREG), 363 STEST statement SYSLIN procedure, 1791 SUMBY statement COMPUTAB procedure, 480 SUMMARY option MONTHLY statement (X11), 2244 QUARTERLY statement (X11), 2248 SUMONLY option PROC COMPUTAB statement, 473 SUR option ENTROPY procedure, 685 FIT statement (MODEL), 1037, 1060, 1232 PROC SYSLIN statement, 1784 SYSLIN procedure, 1780 syntax, 1780 T option ERRORMODEL statement (MODEL), 1030, 1072 TABLES statement X11 procedure, 2250
3142 F Syntax Index
X12 procedure, 2332 TABLES table names TABLES statement (X12), 2332 TAN function, 1208 TANH function, 1208 TARGET statement SIMILARITY procedure, 1604 TAX= option COMPARE statement (LOAN), 893 TAXRATE= option COMPARE statement (LOAN), 893 TCCV= option OUTLIER statement (X12), 2325 TDCOMPUTE= option MONTHLY statement (X11), 2244 TDCUTOFF= option SSPAN statement (X11), 2249 TDREGR= option MONTHLY statement (X11), 2244 TE1 OUTPUT statement (QLIM), 1440 TE2 OUTPUT statement (QLIM), 1440 TECH= option ENTROPY procedure, 688 TECHNIQUE= option ENTROPY procedure, 688 TEST statement AUTOREG procedure, 365 ENTROPY procedure, 693 MODEL procedure, 1055 SYSLIN procedure, 1792 VARMAX procedure, 2072, 2103 TEST= option HETERO statement (AUTOREG), 364 THEIL option SOLVE statement (MODEL), 1054, 1184 THRESHOLDPCT option SSA statement (TIMESERIES), 1872 TI option COMPARE statement (LOAN), 893 TICKER= option LIBNAME statement (SASECRSP), 2408 TIME function, 152 TIME option MODEL statement (PANEL), 1327 TIME= option FIT statement (MODEL), 1039, 1124 SOLVE statement (MODEL), 1051, 1124 TIMEID procedure, 1826 syntax, 1826 TIMEPART function, 96, 152 TIMESERIES procedure, 1854
syntax, 1854 TIN=, 777 _TITLES_ option COLUMNS statement (COMPUTAB), 475 TO= option PROC EXPAND statement, 774, 778 TODAY function, 152 TOL= option ESTIMATE statement (X12), 2317 TOTAL option PROC SIMLIN statement, 1664 TOUT=, 777 TP option MODEL statement (AUTOREG), 359 TR option MODEL statement (AUTOREG), 350 TRACE option PROC MODEL statement, 1024 TRACE= option FCMPOPT statement (SIMILARITY), 1599 TRANSFORM statement X12 procedure, 2332 TRANSFORM=, 777 TRANSFORM= option ARIMA statement (X11), 2239 FORECAST statement (ESM), 734 INPUT statement (SIMILARITY), 1604 OUTPUT statement (AUTOREG), 369 OUTPUT statement (PDLREG), 1406 TARGET statement (SIMILARITY), 1609 VAR statement (TIMESERIES), 1875 TRANSFORMIN= option CONVERT statement (EXPAND), 777, 786 TRANSFORMOUT= option CONVERT statement (EXPAND), 777, 786 TRANSIN=, 777 TRANSOUT=, 777 TRANSPOSE option SSA statement (TIMESERIES), 1873 TRANSPOSE procedure, 117 TRANSPOSE= option CORR statement (TIMESERIES), 1862 CROSSCORR statement (TIMESERIES), 1863 DECOMP statement (TIMESERIES), 1865 SEASON statement (TIMESERIES), 1869 TREND statement (TIMESERIES), 1874 TREND statement TIMESERIES procedure, 1873 TREND= option DFPVALUE macro, 157 DFTEST macro, 159 MODEL statement (VARMAX), 2087 PROC FORECAST statement, 838
Syntax Index F 3143
TREND=LINEAR option MODEL statement (VARMAX), 2161 TRENDADJ option MONTHLY statement (X11), 2244 QUARTERLY statement (X11), 2248 TRENDMA= option MONTHLY statement (X11), 2244 X11 statement (X12), 2338 TRIMMISS= option INPUT statement (SIMILARITY), 1604 TRIMMISSING= option INPUT statement (SIMILARITY), 1610 TRUEINTEREST option COMPARE statement (LOAN), 893 TRUNCATED option ENDOGENOUS statement (QLIM), 1435 TSCSREG procedure, 1925 syntax, 1925 TSFS, 2607 TSFS procedure, 2607 TSNAME = option PROC PANEL statement, 1321 TSVIEW macro, 2774 TUNECHLI= option LIBNAME statement (SASEFAME), 2507 TUNEFAME= option LIBNAME statement (SASEFAME), 2507 TWOSTEP option MODEL statement (PANEL), 1327 TYPE= option FIT statement (MODEL), 1039 MODEL statement (AUTOREG), 349 MODEL statement (MDC), 940 OUTLIER statement (ARIMA), 240 OUTLIER statement (X12), 2324 PROC DATASOURCE statement, 583 PROC SIMLIN statement, 1664 SOLVE statement (MODEL), 1051 TEST statement (AUTOREG), 366 X11 statement (X12), 2338 TYPE=option BLOCKSEASON statement (UCM), 1952 SEASON statement (UCM), 1969 U option ERRORMODEL statement (MODEL), 1030 UCL= option OUTPUT statement (AUTOREG), 369 OUTPUT statement (PDLREG), 1406 UCLM= option OUTPUT statement (AUTOREG), 370 OUTPUT statement (PDLREG), 1406 UCM procedure, 1943
syntax, 1943 UCM procedure, PROC UCM statement PLOT option, 1946 UL option ROWS statement (COMPUTAB), 477 UNIFORMEC= option MODEL statement (MDC), 938 UNIFORMPARM= option MODEL statement (MDC), 938 UNITSCALE= option MODEL statement (MDC), 937 UNITVARIANCE= option MODEL statement (MDC), 940 UNREST option MODEL statement (SYSLIN), 1788 UPPERBOUND= option ENDOGENOUS statement (QLIM), 1435 URSQ option MODEL statement (AUTOREG), 359 USE= option FORECAST statement (ESM), 735 USERDEFINED statement X12 procedure, 2334 USSCP option PROC SYSLIN statement, 1785 USSCP2 option PROC SYSLIN statement, 1785 UTILITY statement MDC procedure, 948 VAR option MODEL statement (PANEL), 1325 MODEL statement (TSCSREG), 1928 VAR statement FORECAST procedure, 839 MODEL procedure, 1056, 1200 SPECTRA procedure, 1695 STATESPACE procedure, 1735 SYSLIN procedure, 1794 TIMESERIES procedure, 1874 X11 procedure, 2250 X12 procedure, 2334 VAR= option FORECAST command (TSFS), 2774, 2775 IDENTIFY statement (ARIMA), 235 TSVIEW command (TSFS), 2774, 2775 VARDEF= option FIT statement (MODEL), 1037, 1062, 1076 MODEL statement (VARMAX), 2087 Proc ENTROPY, 685 PROC SEVERITY statement, 1511 PROC SYSLIN statement, 1785 VARIANCE= option AUTOREG statement (UCM), 1950
3144 F Syntax Index
CYCLE statement (UCM), 1953 IRREGULAR statement (UCM), 1960 LEVEL statement (UCM), 1964 RANDOMREG statement (UCM), 1966 SLOPE statement (UCM), 1970 SPLINEREG statement (UCM), 1971 SPLINESEASON statement (UCM), 1972 VARIANCE=option BLOCKSEASON statement (UCM), 1952 SEASON statement (UCM), 1969 VARLIST statement, 934 VARMAX procedure, 2074 syntax, 2074 VCOMP= option MODEL statement (PANEL), 1327 VDATA= option FIT statement (MODEL), 1039, 1158 VNRRANK option MODEL statement (AUTOREG), 360 W option ARM statement (LOAN), 891 WALD option TEST statement (ENTROPY), 694 TEST statement (MDC), 947 TEST statement (MODEL), 1056 TEST statement (PANEL), 1329 TEST statement (QLIM), 1441 WEEK function, 152 WEEKDAY function, 95 WEEKDAY function, 152 WEIGHT statement, 1102 ENTROPY procedure, 694 MODEL procedure, 1056 SYSLIN procedure, 1794 WEIGHT= option PROC FORECAST statement, 838 WEIGHTS option SPECTRA statement (TIMESERIES), 1871 WEIGHTS statement SPECTRA procedure, 1695 WHEN, 1215 WHERE statement DATASOURCE procedure, 587 WHITE option FIT statement (MODEL), 1041 WHITENOISE= option ESTIMATE statement (ARIMA), 237 IDENTIFY statement (ARIMA), 235 WHITETEST option PROC SPECTRA statement, 1694 WILDCARD= option LIBNAME statement (SASEFAME), 2504
WISHART= option SOLVE statement (MODEL), 1053 WORSTCASE option ARM statement (LOAN), 891 X11 procedure, 2234 syntax, 2234 X11 statement X12 procedure, 2334 X12 procedure, 2301 syntax, 2301 X12 procedure, PROC X12 statement PLOT option, 2308 XBETA OUTPUT statement (COUNTREG), 528 OUTPUT statement (QLIM), 1440 XBETA= option OUTPUT statement (MDC), 946 XLAG option LAG statement (PANEL), 1323 XLAG statement LAG statement (PANEL), 1323 XLAG= option MODEL statement (VARMAX), 2093 XPX option FIT statement (MODEL), 1041, 1093 MODEL statement (PDLREG), 1404 MODEL statement (SYSLIN), 1788 XREF option PROC MODEL statement, 1023, 1219 YEAR function, 95, 152 YRAHEADOUT option PROC X11 statement, 2237 YYQ function, 95, 152 ZERO= option COLUMNS statement (COMPUTAB), 475 ROWS statement (COMPUTAB), 477 ZEROMISS option PROC FORECAST statement, 838 ZEROMISS= option FORECAST statement (ESM), 735 INPUT statement (SIMILARITY), 1604 TARGET statement (SIMILARITY), 1610 ZEROMISSING= option ID statement (PROC ESM), 738 ID statement (SIMILARITY), 1602 ZEROMODEL statement COUNTREG procedure, 530 ZEROWEIGHT= option MONTHLY statement (X11), 2244 QUARTERLY statement (X11), 2248
Syntax Index F 3145
ZGAMMA OUTPUT statement (COUNTREG), 529 ZLAG option LAG statement (PANEL), 1323 ZLAG statement LAG statement (PANEL), 1323
Your Turn We welcome your feedback. If you have comments about this book, please send them to
[email protected]. Include the full title and page numbers (if applicable). If you have comments about the software, please send them to
[email protected].
SAS Publishing Delivers! ®
Whether you are new to the work force or an experienced professional, you need to distinguish yourself in this rapidly changing and competitive job market. SAS Publishing provides you with a wide range of resources to help you set yourself apart. Visit us online at support.sas.com/bookstore. ®
SAS Press ®
Need to learn the basics? Struggling with a programming problem? You’ll find the expert answers that you need in example-rich books from SAS Press. Written by experienced SAS professionals from around the world, SAS Press books deliver real-world insights on a broad range of topics for all skill levels.
SAS Documentation
support.sas.com/saspress
®
To successfully implement applications using SAS software, companies in every industry and on every continent all turn to the one source for accurate, timely, and reliable information: SAS documentation. We currently produce the following types of reference documentation to improve your work experience: • Online help that is built into the software. • Tutorials that are integrated into the product. • Reference documentation delivered in HTML and PDF – free on the Web. • Hard-copy books.
support.sas.com/publishing
SAS Publishing News ®
Subscribe to SAS Publishing News to receive up-to-date information about all new SAS titles, author podcasts, and new Web site features via e-mail. Complete instructions on how to subscribe, as well as access to past issues, are available at our Web site.
support.sas.com/spn
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2009 SAS Institute Inc. All rights reserved. 518177_1US.0109
66
66