The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 18

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 18 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 182020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 18)

Here Q is an N × (p + 1) orthogonalmatrix, QT Q = I, and R is a (p + 1) × (p + 1) upper triangular matrix.The QR decomposition represents a convenient orthogonal basis for thecolumn space of X. It is easy to see, for example, that the least squaressolution is given byβ̂ŷ=R−1 QT y,(3.32)=T(3.33)QQ y.Equation (3.32) is easy to solve because R is upper triangular(Exercise 3.4).563. Linear Methods for Regression3.2.4 Multiple OutputsSuppose we have multiple outputs Y1 , Y2 , .

. . , YK that we wish to predictfrom our inputs X0 , X1 , X2 , . . . , Xp . We assume a linear model for eachoutputYk=β0k +pXXj βjk + εk(3.34)j=1=fk (X) + εk .(3.35)With N training cases we can write the model in matrix notationY = XB + E.(3.36)Here Y is the N ×K response matrix, with ik entry yik , X is the N ×(p+1)input matrix, B is the (p + 1) × K matrix of parameters and E is theN × K matrix of errors. A straightforward generalization of the univariateloss function (3.2) isRSS(B)=NK XX(yik − fk (xi ))2(3.37)k=1 i=1=tr[(Y − XB)T (Y − XB)].(3.38)The least squares estimates have exactly the same form as beforeB̂ = (XT X)−1 XT Y.(3.39)Hence the coefficients for the kth outcome are just the least squares estimates in the regression of yk on x0 , x1 , . .

. , xp . Multiple outputs do notaffect one another’s least squares estimates.If the errors ε = (ε1 , . . . , εK ) in (3.34) are correlated, then it might seemappropriate to modify (3.37) in favor of a multivariate version. Specifically,suppose Cov(ε) = Σ, then the multivariate weighted criterionRSS(B; Σ) =NXi=1(yi − f (xi ))T Σ−1 (yi − f (xi ))(3.40)arises naturally from multivariate Gaussian theory. Here f (x) is the vectorfunction (f1 (x), . . . , fK (x))T , and yi the vector of K responses for observation i. However, it can be shown that again the solution is given by(3.39); K separate regressions that ignore the correlations (Exercise 3.11).If the Σi vary among observations, then this is no longer the case, and thesolution for B no longer decouples.In Section 3.7 we pursue the multiple outcome problem, and considersituations where it does pay to combine the regressions.3.3 Subset Selection573.3 Subset SelectionThere are two reasons why we are often not satisfied with the least squaresestimates (3.6).• The first is prediction accuracy: the least squares estimates often havelow bias but large variance.

Prediction accuracy can sometimes beimproved by shrinking or setting some coefficients to zero. By doingso we sacrifice a little bit of bias to reduce the variance of the predictedvalues, and hence may improve the overall prediction accuracy.• The second reason is interpretation. With a large number of predictors, we often would like to determine a smaller subset that exhibitthe strongest effects. In order to get the “big picture,” we are willingto sacrifice some of the small details.In this section we describe a number of approaches to variable subset selection with linear regression.

In later sections we discuss shrinkage and hybridapproaches for controlling variance, as well as other dimension-reductionstrategies. These all fall under the general heading model selection. Modelselection is not restricted to linear models; Chapter 7 covers this topic insome detail.With subset selection we retain only a subset of the variables, and eliminate the rest from the model. Least squares regression is used to estimatethe coefficients of the inputs that are retained.

There are a number of different strategies for choosing the subset.3.3.1 Best-Subset SelectionBest subset regression finds for each k ∈ {0, 1, 2, . . . , p} the subset of size kthat gives smallest residual sum of squares (3.2). An efficient algorithm—the leaps and bounds procedure (Furnival and Wilson, 1974)—makes thisfeasible for p as large as 30 or 40.

Figure 3.5 shows all the subset modelsfor the prostate cancer example. The lower boundary represents the modelsthat are eligible for selection by the best-subsets approach. Note that thebest subset of size 2, for example, need not include the variable that wasin the best subset of size 1 (for this example all the subsets are nested).The best-subset curve (red lower boundary in Figure 3.5) is necessarilydecreasing, so cannot be used to select the subset size k. The question ofhow to choose k involves the tradeoff between bias and variance, along withthe more subjective desire for parsimony.

There are a number of criteriathat one may use; typically we choose the smallest model that minimizesan estimate of the expected prediction error.Many of the other approaches that we discuss in this chapter are similar,in that they use the training data to produce a sequence of models varyingin complexity and indexed by a single parameter.

In the next section we use3. Linear Methods for Regression•60••••••40•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••45678••020Residual Sum−of−Squares80100580123Subset Size kFIGURE 3.5. All possible subset models for the prostate cancer example. Ateach subset size is shown the residual sum-of-squares for each model of that size.cross-validation to estimate prediction error and select k; the AIC criterionis a popular alternative. We defer more detailed discussion of these andother approaches to Chapter 7.3.3.2 Forward- and Backward-Stepwise SelectionRather than search through all possible subsets (which becomes infeasiblefor p much larger than 40), we can seek a good path through them.

Forwardstepwise selection starts with the intercept, and then sequentially adds intothe model the predictor that most improves the fit. With many candidatepredictors, this might seem like a lot of computation; however, clever updating algorithms can exploit the QR decomposition for the current fit torapidly establish the next candidate (Exercise 3.9). Like best-subset regression, forward stepwise produces a sequence of models indexed by k, thesubset size, which must be determined.Forward-stepwise selection is a greedy algorithm, producing a nested sequence of models.

In this sense it might seem sub-optimal compared tobest-subset selection. However, there are several reasons why it might bepreferred:3.3 Subset Selection59• Computational; for large p we cannot compute the best subset sequence, but we can always compute the forward stepwise sequence(even when p ≫ N ).• Statistical; a price is paid in variance for selecting the best subsetof each size; forward stepwise is a more constrained search, and willhave lower variance, but perhaps more bias.0.900.850.800.650.700.75E||β̂(k) − β||20.95Best SubsetForward StepwiseBackward StepwiseForward Stagewise051015202530Subset Size kFIGURE 3.6. Comparison of four subset-selection techniques on a simulated linear regression problem Y = X T β + ε.

There are N = 300 observations on p = 31standard Gaussian variables, with pairwise correlations all equal to 0.85. For 10 ofthe variables, the coefficients are drawn at random from a N (0, 0.4) distribution;the rest are zero. The noise ε ∼ N (0, 6.25), resulting in a signal-to-noise ratio of0.64. Results are averaged over 50 simulations.

Shown is the mean-squared errorof the estimated coefficient β̂(k) at each step from the true β.Backward-stepwise selection starts with the full model, and sequentiallydeletes the predictor that has the least impact on the fit. The candidate fordropping is the variable with the smallest Z-score (Exercise 3.10). Backwardselection can only be used when N > p, while forward stepwise can alwaysbe used.Figure 3.6 shows the results of a small simulation study to comparebest-subset regression with the simpler alternatives forward and backwardselection.

Their performance is very similar, as is often the case. Included inthe figure is forward stagewise regression (next section), which takes longerto reach minimum error.603. Linear Methods for RegressionOn the prostate cancer example, best-subset, forward and backward selection all gave exactly the same sequence of terms.Some software packages implement hybrid stepwise-selection strategiesthat consider both forward and backward moves at each step, and selectthe “best” of the two. For example in the R package the step function usesthe AIC criterion for weighing the choices, which takes proper account ofthe number of parameters fit; at each step an add or drop will be performedthat minimizes the AIC score.Other more traditional packages base the selection on F -statistics, adding“significant” terms, and dropping “non-significant” terms.

These are outof fashion, since they do not take proper account of the multiple testingissues. It is also tempting after a model search to print out a summary ofthe chosen model, such as in Table 3.2; however, the standard errors arenot valid, since they do not account for the search process. The bootstrap(Section 8.2) can be useful in such settings.Finally, we note that often variables come in groups (such as the dummyvariables that code a multi-level categorical predictor). Smart stepwise procedures (such as step in R) will add or drop whole groups at a time, takingproper account of their degrees-of-freedom.3.3.3 Forward-Stagewise RegressionForward-stagewise regression (FS) is even more constrained than forwardstepwise regression. It starts like forward-stepwise regression, with an intercept equal to ȳ, and centered predictors with coefficients initially all 0.At each step the algorithm identifies the variable most correlated with thecurrent residual. It then computes the simple linear regression coefficientof the residual on this chosen variable, and then adds it to the current coefficient for that variable.

This is continued till none of the variables havecorrelation with the residuals—i.e. the least-squares fit when N > p.Unlike forward-stepwise regression, none of the other variables are adjusted when a term is added to the model. As a consequence, forwardstagewise can take many more than p steps to reach the least squares fit,and historically has been dismissed as being inefficient. It turns out thatthis “slow fitting” can pay dividends in high-dimensional problems.

Wesee in Section 3.8.1 that both forward stagewise and a variant which isslowed down even further are quite competitive, especially in very highdimensional problems.Forward-stagewise regression is included in Figure 3.6. In this example ittakes over 1000 steps to get all the correlations below 10−4 .

For subset sizek, we plotted the error for the last step for which there where k nonzerocoefficients. Although it catches up with the best fit, it takes longer todo so.3.4 Shrinkage Methods613.3.4 Prostate Cancer Data Example (Continued)Table 3.3 shows the coefficients from a number of different selection andshrinkage methods. They are best-subset selection using an all-subsets search,ridge regression, the lasso, principal components regression and partial leastsquares.

Each method has a complexity parameter, and this was chosen tominimize an estimate of prediction error based on tenfold cross-validation;full details are given in Section 7.10. Briefly, cross-validation works by dividing the training data randomly into ten equal parts. The learning methodis fit—for a range of values of the complexity parameter—to nine-tenths ofthe data, and the prediction error is computed on the remaining one-tenth.This is done in turn for each one-tenth of the data, and the ten predictionerror estimates are averaged.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.