SAS. Linear Regression (1185357), страница 2
Текст из файла (страница 2)
All rights reserved.WHEN THE CONSTANT VARIANCEASSUMPTION IS VIOLATEDRequest tests using the heteroscedasticity‐consistent variance estimates. Transform the dependent variable. Model the nonconstant variance by using:• PROC GENMOD or PROC GLIMMIX with the appropriate DIST= option• PROC MIXED with the GROUP= option and TYPE =option• SAS SURVEY procedures for survey data• SAS/ETS procedures for time‐series data• Weighted least squares regression modelASSUMPTIONS CONSTANT VARIANCE: PLOTSCopyright © 2013, SAS Institute Inc.
All rights reserved.ASSUMPTIONS CONSTANT VARIANCE: TESTSmodel Y = X1 X2 X3 / white hcc hccmethod=0;Parameter EstimatesHeteroscedasticity ConsistentVariableDFParameter StandardEstimateErrort ValuePr > |t|Standardt ValueErrorPr > |t|Intercept1,004,042,171,860,072,681,510,14Hwympg1,00-0,800,21-3,760,000,19-4,16<.0001Hwympg21,000,040,013,040,000,014,21<.0001Horsepower1,000,100,026,03<.00010,024,72<.0001Test of First and Second MomentSpecificationmodel Y = X1 X2 X3 / spec ;DFChi-SquarePr > ChiSq816.490.0359proc corr[next slide …]WARNING: The average covariance matrix for the SPEC test has been deemedsingular which violates an assumption of the test. Use caution when interpretingthe results of the test.Copyright © 2013, SAS Institute Inc.
All rights reserved.ASSUMPTIONS CONSTANT VARIANCESPEARMAN RANK CORRELATION COEFFICIENT• The Spearman rank correlation coefficient is available as an option in PROC CORR • If the Spearman rank correlation coefficient between the absolute value of the residuals and the predicted values is • close to zero, then the variances are approximately equal • positive, then the variance increases as the mean increases • negative, then the variance decreases as the mean increases.Copyright © 2013, SAS Institute Inc. All rights reserved.proc reg data=sasuser.cars2 plots (label)= all;model price = hwympg hwympg2 horsepower /spec ;output out=check r=residual p=pred;run;data check;set check;abserror=abs(residual);run;proc corr data=check spearman nosimple;var abserror pred;title 'Spearman corr.';run;ASSUMPTIONS LINEAR RELATION BETWEEN E[Y] AND XUse the diagnostic plots available via the ODS Graphics output of PROC REG to evaluate the model fit:• Plots of residuals and studentized residuals versus predicted values • “Residual‐Fit Spread” (or R‐F) plot • Plots of the observed values versus the predicted values • Partial regression leverage plotsand…WHEN A STRAIGHT LINE IS INAPPROPRIATE• Fit a polynomial regression model.• Transform the independent variables to obtain linearity.• Fit a nonlinear regression model using PROC NLIN if appropriate.• Fit a nonparametric regression model using PROC LOESS.•Examine model‐fitting statistics such as R2, adjusted R2, AIC, SBC, and Mallows’ Cp.•Use the LACKFIT option in the MODEL statement in PROC REG to test for lack‐of‐fit for models that have replicates for each value of the combination of the independent variables.Copyright © 2013, SAS Institute Inc.
All rights reserved.ASSUMPTIONS LINEAR RELATION BETWEEN E[Y] AND X: PLOTSPlots of residuals and studentized residuals versus predicted values Copyright © 2013, SAS Institute Inc. All rights reserved.ASSUMPTIONS LINEAR RELATION BETWEEN E[Y] AND X: PLOTS“Residual‐Fit Spread” (or R‐F) plot Copyright © 2013, SAS Institute Inc. All rights reserved.ASSUMPTIONS LINEAR RELATION BETWEEN E[Y] AND X: PLOTSPlots of the observed values versus the predicted valuesCopyright © 2013, SAS Institute Inc. All rights reserved.ASSUMPTIONS LINEAR RELATION BETWEEN E[Y] AND XPartial regression leverage plotsmodel … / partialresiduals for thedependent variable arecalculated with theselected regressoromittedresiduals for the selectedregressor are calculatedfrom a model where theselected regressor isregressed on the remainingregressorsCopyright © 2013, SAS Institute Inc.
All rights reserved.(4) COLLINEARITY AND INFLUENTIALOBSERVATION DETECTIONMULTIPLE LINEAR REGRESSIONCopyright © 2013, SAS Institute Inc. All rights reserved.WHAT ELSE CANMULTICOLLINEARITYHAPPEN…•Correlation statistics (PROC CORR)•Variance inflation factors (VIF option in the MODEL statement in PROC REG)•Condition index values (COLLIN and COLLINOINT options in the MODEL statement in PROC REG)Copyright © 2013, SAS Institute Inc. All rights reserved.WHEN THERE IS MULTICOLLINEARITY• Exclude redundant independent variables.• Redefine independent variables. • Use biased regression techniques such as ridge regression or principal component regression. • Center the independent variables in polynomial regression models.• PROC VARCLASS to select vars [next time]WHAT ELSE CANMULTICOLLINEARITYHAPPEN…Parameter Estimatesproc reg data=sasuser.cars2plots (label)=all;model price = hwympghwympg2horsepower/ vif collin collinoint;run;VariableDFParameterStandardEstimateErrort ValuePr > |t|InflationIntercept1,004,042,171,860,070,00Hwympg1,00-0,800,21-3,760,004,07Hwympg21,000,040,013,040,002,27Horsepower1,000,100,026,03<.00012,37Collinearity DiagnosticsNumberEigenvalueConditionProportion of VariationIndexInterceptHwympgHwympg2Horsepower1,002,181,000,010,000,030,012,001,531,190,000,090,070,003,000,272,850,030,320,690,004,000,039,250,960,580,210,99Collinearity Diagnostics (intercept adjusted)ConditionCopyright © 2013, SAS Institute Inc.
All rights reserved.VarianceNumberEigenvalue1,00Proportion of VariationIndexHwympgHwympg2 Horsepower2,061,000,050,060,062,000,801,610,000,280,263,000,143,790,950,660,68WHAT ELSE CANMULTICOLLINEARITY: RIDGE REGHAPPEN…proc reg data=acetyl outvifoutest=b ridge=0 to 0.02 by .002;model x4=x1 x2 x3 x1x2 x1x1;run;Copyright © 2013, SAS Institute Inc. All rights reserved.WHAT ELSE CANINFLUENTIAL OBSERVATIONSHAPPEN…Copyright © 2013, SAS Institute Inc. All rights reserved.WHAT ELSE CANINFLUENTIAL OBSERVATIONSHAPPEN…hi =( )iiRSTUDENT residualmeasures the change in the residuals when an observation is deleted from the model.Leveragemeasures how far an observation is from the cloud of observed data pointsCook’s Dmeasures the simultaneous change in the parameter estimates when an observation is deleted.DFFITSmeasures the change in predicted values when an observation is deleted from the model. (…continued …)Copyright © 2013, SAS Institute Inc.
All rights reserved.1WHAT ELSE CANINFLUENTIAL OBSERVATIONSHAPPEN…b j b j (i )ˆ (b j )DFBETA j ( i )DFBETAsmeasures the change in each parameter estimate when an observation is deleted from the model.s(2i ) X (i ) X (i ) 1COVRATIOi s 2 X X 1COVRATIOmeasures the change in the precision of the parameter estimates when an observation is deleted from the modelCopyright © 2013, SAS Institute Inc. All rights reserved.WHEN THERE ARE INFLUENTIALOBSERVATIONS• Make sure that there are no data errors.• Perform a sensitivity analysis and report results from different scenarios.• Investigate the cause of the influential observations and redefine the model if appropriate.• Delete the influential observations if appropriate and document the situation.• Limit the influence of outliers by performing robust regression analysis using PROC ROBUSTREG.WHAT ELSE CANINFLUENTIAL OBSERVATIONSHAPPEN…IDENTIFYING INFLUENTIAL OBSERVATIONS – SUMMARY OF SUGGESTED CUTOFFSsupport.sas.com onproc regCopyright © 2013, SAS Institute Inc.
All rights reserved.INFLUENTIALCODEOBSERVATIONSproc reg data=sasuser.cars2 plots (label)=all;model price = hwympg hwympg2 horsepower/ influence;id model;output out=check r=residual p=pred h=leverage rstudent=rstudent covratio=CVR;plot COVRATIO.* (hwympg hwympg2 horsepower) / vref=(0.88 1.11) ;run;%let numparms = 4; %let numobs = 81;data influence;set check;absrstud=abs(rstudent);if absrstud ge 2 then output;else if leverage ge (2*&numparms /&numobs) then output;run;proc print data=influence;var manufacturer model price hwympg horsepower;run;Copyright © 2013, SAS Institute Inc.
All rights reserved.INFLUENTIALPLOTS: RSTUDENT & LEVERAGEOBSERVATIONSCopyright © 2013, SAS Institute Inc. All rights reserved.INFLUENTIALPLOTS: COOK’S D & DFFITSOBSERVATIONSCopyright © 2013, SAS Institute Inc. All rights reserved.INFLUENTIALPLOTS: DFBETAS & COVRATIOOBSERVATIONSCopyright © 2013, SAS Institute Inc. All rights reserved.INFLUENTIALCODEOBSERVATIONSproc reg data=sasuser.cars2 plots (label)=all;model price = hwympg hwympg2 horsepower/ influence;id model;output out=check r=residual p=pred h=leverage rstudent=rstudent covratio=CVR;plot COVRATIO.* (hwympg hwympg2 horsepower) / vref=(0.88 1.11) ;run;%let numparms = 4; %let numobs = 81;data influence;set check;absrstud=abs(rstudent);if absrstud ge 2 then output;else if leverage ge (2*&numparms /&numobs) then output;run;proc print data=influence;var manufacturer model price hwympg horsepower;run;Copyright © 2013, SAS Institute Inc.
All rights reserved.HOME WORK• Same as at lecture• POLYNOMIAL REGRESSION• PROC GLMSELECT• BOX‐COX ETC. TRANSFORMATION Copyright © 2013, SAS Institute Inc. All rights reserved..