SAS. Linear Regression (1185357)
Текст из файла
LINEAR REGRESSIONCopyright © 2013, SAS Institute Inc. All rights reserved.LINEARREGRESSION AND OTHER MODELSREGRESSIONType of PredictorsCategoricalContinuousContinuous and CategoricalContinuousAnalysis of Variance (ANOVA)Ordinary Least Squares (OLS) RegressionAnalysis of Covariance (ANCOVA)CategoricalContingency Table Analysis or Logistic Logistic RegressionRegressionType of ResponseCopyright © 2013, SAS Institute Inc.
All rights reserved.Logistic RegressionLINEAR…REGRESSIONS…REGRESSION•••••••••••linear / non‐linearlogisticOLSPLS LARRIDGELASSOLOESSROBUSTQUANTILE...Copyright © 2013, SAS Institute Inc. All rights reserved.RELATIONSHIP HEIGHT-WEIGHT – CORRELATION???110010009008004Copyright © 2013, SAS Institute Inc. All rights reserved.0 1020304050607080MULTIPLE LINEAR REGRESSIONCopyright © 2013, SAS Institute Inc.
All rights reserved.MULTIPLE LINEARMODELREGRESSIONIn general, you model the dependent variable, Y, as a linear function of k independent variables, X1 through Xk:•Y=0+1X1+...+kXk+•Y=0+1X1+2X2+Linear?Copyright © 2013, SAS Institute Inc. All rights reserved.Y=0+1X1+2X12+3X2+4X22+Nonlinear?MULTIPLE LINEARSAS/STAT SYNTAX & EXAMPLE DATAREGRESSIONproc reg data=sasuser.fitness;MODEL Oxygen_Consumption = RunTimeAgeWeightRun_PulseRest_PulseMaximum_PulsePerformance;run; quit;Copyright © 2013, SAS Institute Inc.
All rights reserved.MULTIPLE LINEARAPPLICATIONS: PREDICTION VS. EXPLANATORY ANALYSISREGRESSION••The terms in the model, the values of their coefficients, and their statistical significance are of secondary importance.The focus is on producing a model that is the best at predicting future values of Y as a function of the Xs. The predicted value of Y is given by this formula:Yˆ ˆ 0 ˆ1 X 1 ˆ k X kCopyright © 2013, SAS Institute Inc. All rights reserved.••The focus is on understanding the relationship between the dependent variable and the independent variables.Consequently, the statistical significance of the coefficients is important as well as the magnitudes and signs of the coefficients.Yˆ ˆ 0 ˆ1 X 1 ˆ k X kSIMPLE LINEARMODELREGRESSIONUnknownRelationshipY=0+1X^Y–YResidual^ ^+ ^Y=01XRegressionBest Fit LineCopyright © 2013, SAS Institute Inc.
All rights reserved.SIMPLE LINEARTHE BASELINE MODELREGRESSIONȲCopyright © 2013, SAS Institute Inc. All rights reserved.SIMPLE LINEARVARIABILITYREGRESSIONUnexplainedTotalȲExplained*^ ^0+1^Y=XCopyright © 2013, SAS Institute Inc. All rights reserved.MULTIPLE LINEARHYPOTHESESREGRESSIONNull Hypothesis:• The regression model does not fit the data better than the baseline model.• 1=2=…=k=0 – F‐statistic • Also i=0 for each predictor – t‐statisticAlternative Hypothesis:• The regression model does fit the data better than the baseline model.• Not all is equal zero.Copyright © 2013, SAS Institute Inc.
All rights reserved.MULTIPLE LINEARMODEL DEVELOPMENT PROCESSREGRESSION(1) Exploratory Data Analysis(2) CandidateModel Selection(3) ModelAssumptionValidation(4) Collinearity and Influential Observation DetectionYes(5) ModelRevisionCopyright © 2013, SAS Institute Inc. All rights reserved.No(6) PredictionTesting(2) CANDIDATE MODEL SELECTIONMULTIPLE LINEAR REGRESSIONCopyright © 2013, SAS Institute Inc. All rights reserved.CANDIDATE MODELMODEL SELECTIONSELECTIONMODEL SELECTION OPTIONSThe SELECTION= option in the MODEL statement of PROC REG supports these model selection techniques:• Stepwise selection methods•••STEPWISE, FORWARD, or BACKWARDAll‐possible regressions ranked using•RSQUARE, ADJRSQ, or CPMINR, MAXR [home work]• SELECTION=NONE is the default.•Copyright © 2013, SAS Institute Inc.
All rights reserved.CANDIDATE MODELALL‐POSSIBLE REGRESSIONSSELECTIONVariables inFull Model (k)Total Number ofSubset Models (2k)01234512481632Copyright © 2013, SAS Institute Inc. All rights reserved.CANDIDATE MODELALL‐POSSIBLE REGRESSIONSSELECTIONods graphics / imagemap=on;proc reg data=sasuser.fitnessplots(only)=(rsquare adjrsq cp);ALL_REG: model oxygen_consumption= Performance RunTime Age WeightRun_Pulse Rest_Pulse Maximum_Pulse/ selection=rsquareadjrsq cp best=10;title 'Best Models Using All-Regression Option';run;quit;Copyright © 2013, SAS Institute Inc.
All rights reserved.CANDIDATE MODELALL-POSSIBLE REGRESSIONS: RANKSELECTION2R ADJCopyright © 2013, SAS Institute Inc. All rights reserved.( n i )(1 R 2 ) 1n pMODEL SELECTION MALLOWS’ CP•Look for models with max p : Cp p, p = number of parameters + intercept.HOCKING'S CRITERION VERSUS MALLOWS’ CP• Hocking (1976) suggests selecting a model based on the following:• Cp p for prediction• Cp 2p pfull + 1 for parameter estimationCopyright © 2013, SAS Institute Inc. All rights reserved.MODEL SELECTION MALLOWS’ CPCopyright © 2013, SAS Institute Inc.
All rights reserved.MODEL SELECTION ALL‐POSSIBLE REGRESSIONS RANKED USINGModel Number inIndexModel142535455463738595C(p)R‐Square4.00044.25984.71584.71684.95675.85705.93675.97835.98560.83550.84690.84390.84390.82920.81010.80960.83560.8356AdjustedR‐Square0.81020.81630.81270.81270.80290.78900.78840.80270.80271066.04920.84830.81041166.17580.84750.80941266.61710.84460.80571366.71110.84400.8049……………Copyright © 2013, SAS Institute Inc.
All rights reserved.Variables in ModelRunTime Age Run_Pulse Maximum_PulseRunTime Age Weight Run_Pulse Maximum_PulsePerformance RunTime Weight Run_Pulse Maximum_PulsePerformance RunTime Age Run_Pulse Maximum_PulsePerformance RunTime Run_Pulse Maximum_PulseRunTime Run_Pulse Maximum_PulseRunTime Age Run_PulseRunTime Age Run_Pulse Rest_Pulse Maximum_PulsePerformance Age Weight Run_Pulse Maximum_PulsePerformance RunTime Age Weight Run_PulseMaximum_PulseRunTime Age Weight Run_Pulse Rest_PulseMaximum_PulsePerformance RunTime Weight Run_Pulse Rest_PulseMaximum_PulsePerformance RunTime Age Run_Pulse Rest_PulseMaximum_Pulse…MODEL SELECTION STEPWISE SELECTION METHODSFORWARDSELECTIONBACKWARDELIMINATIONSTEPWISESELECTIONFORWARD SELECTION012345StopSLENTRY=valueSLE=valueCopyright © 2013, SAS Institute Inc.
All rights reserved.SLSTAY=valueSLS=valueMODEL SELECTION STEPWISE SELECTION METHODSproc reg data=sasuser.fitness plots(only)=adjrsq;FORWARD: model oxygen_consumption= Performance RunTime Age WeightRun_Pulse Rest_Pulse Maximum_Pulse/ selection=forward;BACKWARD: model oxygen_consumption= Performance RunTime Age WeightRun_Pulse Rest_Pulse Maximum_Pulse/ selection=backward;STEPWISE: model oxygen_consumption= Performance RunTime Age WeightRun_Pulse Rest_Pulse Maximum_Pulse/ selection=stepwise;title 'Best Models Using Stepwise Selection';run;quit;Copyright © 2013, SAS Institute Inc. All rights reserved.(3) MODEL ASSUMPTION VALIDATIONMULTIPLE LINEAR REGRESSIONCopyright © 2013, SAS Institute Inc. All rights reserved.MULTIPLE LINEARASSUMPTIONSREGRESSION•The mean of the Ys is accurately modeled by a linear function of the Xs.•The assumptions for linear regression are that the error terms are independent and normally distributed with equal variance.ε ~ iid N(0,σ2)•Therefore, evaluating model assumptions for linear regression includes checking for Independent observations Normally distributed error terms Constant varianceCopyright © 2013, SAS Institute Inc.
All rights reserved.ASSUMPTIONS INDEPENDENCE••Know the source of your data: correlated errors can arise from data gathered over time, repeated measures, clustered data, or data from complex survey designs.For time‐series data, check that the errors are independent by examining • plots of residuals versus time or other ordering component• Durbin‐Watson statistic or the first‐order autocorrelation statistic for time‐series dataCopyright © 2013, SAS Institute Inc.
All rights reserved.WHEN THE INDEPENDENCE ASSUMPTION IS VIOLATEDUse the appropriate modeling tools to account for correlated observations:• PROC MIXED, PROC GENMOD, or PROC GLIMMIX for repeated measures data• PROC AUTOREG or PROC ARIMA in SAS/ETS for time‐series data – [NEXT SAS COURSE]• PROC SURVEYREG for survey dataASSUMPTIONS NORMALITYCheck that the error terms are normally distributed by examining:a histogram of the residuals• a normal probability plot of the residuals• tests for normality•Copyright © 2013, SAS Institute Inc. All rights reserved.WHEN THE NORMALITYASSUMPTION IS VIOLATED•Transform the dependent variable•Fit a generalized linear model using PROC GENMOD or PROC GLIMMIX with the appropriate DIST= and LINK= option.ASSUMPTIONS NORMALITYproc reg data=sasuser.cars2 plots=all;model price = hwympg hwympg2 horsepower;run;Copyright © 2013, SAS Institute Inc. All rights reserved.Also, formal test for normality inproc univariateASSUMPTIONS CONSTANT VARIANCECheck for constant variance of the error terms by examining:plot of residuals versus predicted values• plots of residuals versus the independent variables• test for heteroscedasticity• Spearman rank correlation coefficient between absolute values of the residuals and predicted values.•Copyright © 2013, SAS Institute Inc.
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.