The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 19

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 19 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 192020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 19)

From this we obtain an estimated predictionerror curve as a function of the complexity parameter.Note that we have already divided these data into a training set of size67 and a test set of size 30. Cross-validation is applied to the training set,since selecting the shrinkage parameter is part of the training process. Thetest set is there to judge the performance of the selected model.The estimated prediction error curves are shown in Figure 3.7. Many ofthe curves are very flat over large ranges near their minimum. Includedare estimated standard error bands for each estimated error rate, based onthe ten error estimates computed by cross-validation. We have used the“one-standard-error” rule—we pick the most parsimonious model withinone standard error of the minimum (Section 7.10, page 244).

Such a ruleacknowledges the fact that the tradeoff curve is estimated with error, andhence takes a conservative approach.Best-subset selection chose to use the two predictors lcvol and lweight.The last two lines of the table give the average prediction error (and itsestimated standard error) over the test set.3.4 Shrinkage MethodsBy retaining a subset of the predictors and discarding the rest, subset selection produces a model that is interpretable and has possibly lower prediction error than the full model. However, because it is a discrete process—variables are either retained or discarded—it often exhibits high variance,and so doesn’t reduce the prediction error of the full model. Shrinkagemethods are more continuous, and don’t suffer as much from highvariability.3.4.1Ridge RegressionRidge regression shrinks the regression coefficients by imposing a penaltyon their size.

The ridge coefficients minimize a penalized residual sum of623. Linear Methods for Regression1.6CV Error1.2••••24••60.6••80•2••4•• •68LassoPrincipal Components Regression1.61.61.8Degrees of Freedom1.8Subset Size0.21.0CV Error0.6•0.0•0.8••0.4•••0.60.8•1.0Shrinkage Factor s•0.60.81.0••1.21.4•1.21.4•0.81.00.6•0CV Error•1.01.21.4•0.8CV Error1.41.61.8Ridge Regression1.8All Subsets02•••4•6••8Number of Directions1.01.2•0.8CV Error1.41.61.8Partial Least Squares0.6•0•2•••4•6••8Number of DirectionsFIGURE 3.7.

Estimated prediction error curves and their standard errors forthe various selection and shrinkage methods. Each curve is plotted as a functionof the corresponding complexity parameter for that method. The horizontal axishas been chosen so that the model complexity increases as we move from left toright. The estimates of prediction error and their standard errors were obtained bytenfold cross-validation; full details are given in Section 7.10.

The least complexmodel within one standard error of the best is chosen, indicated by the purplevertical broken lines.3.4 Shrinkage Methods63TABLE 3.3. Estimated coefficients and test error results, for different subsetand shrinkage methods applied to the prostate data. The blank entries correspondto variables omitted.TermInterceptlcavollweightagelbphsvilcpgleasonpgg45Test ErrorStd ErrorLS2.4650.6800.263−0.1410.2100.305−0.288−0.0210.2670.5210.179Best Subset2.4770.7400.316Ridge2.4520.4200.238−0.0460.1620.2270.0000.0400.1330.4920.1650.4920.143Lasso2.4680.5330.169PCR2.4970.5430.289−0.1520.2140.315−0.0510.232−0.0560.4490.1050.0020.0940.4790.164PLS2.4520.4190.344−0.0260.2200.2430.0790.0110.0840.5280.152squares,β̂ ridge = argminβXNi=1yi − β0 −pXxij βjj=12+λpXj=1βj2 .(3.41)Here λ ≥ 0 is a complexity parameter that controls the amount of shrinkage: the larger the value of λ, the greater the amount of shrinkage.

Thecoefficients are shrunk toward zero (and each other). The idea of penalizing by the sum-of-squares of the parameters is also used in neural networks,where it is known as weight decay (Chapter 11).An equivalent way to write the ridge problem isβ̂ridge= argminβsubject toN Xi=1pXj=1yi − β0 −βj2pXj=1xij βj2,(3.42)≤ t,which makes explicit the size constraint on the parameters. There is a oneto-one correspondence between the parameters λ in (3.41) and t in (3.42).When there are many correlated variables in a linear regression model,their coefficients can become poorly determined and exhibit high variance.A wildly large positive coefficient on one variable can be canceled by asimilarly large negative coefficient on its correlated cousin. By imposing asize constraint on the coefficients, as in (3.42), this problem is alleviated.The ridge solutions are not equivariant under scaling of the inputs, andso one normally standardizes the inputs before solving (3.41). In addition,643.

Linear Methods for Regressionnotice that the intercept β0 has been left out of the penalty term. Penalization of the intercept would make the procedure depend on the originchosen for Y ; that is, adding a constant c to each of the targets yi wouldnot simply result in a shift of the predictions by the same amount c. Itcan be shown (Exercise 3.5) that the solution to (3.41) can be separatedinto two parts, after reparametrization using centered inputs: each xij getsPNreplaced by xij − x̄j . We estimate β0 by ȳ = N1 1 yi . The remaining coefficients get estimated by a ridge regression without intercept, using thecentered xij .

Henceforth we assume that this centering has been done, sothat the input matrix X has p (rather than p + 1) columns.Writing the criterion in (3.41) in matrix form,RSS(λ) = (y − Xβ)T (y − Xβ) + λβ T β,(3.43)the ridge regression solutions are easily seen to beβ̂ ridge = (XT X + λI)−1 XT y,(3.44)where I is the p×p identity matrix. Notice that with the choice of quadraticpenalty β T β, the ridge regression solution is again a linear function ofy.

The solution adds a positive constant to the diagonal of XT X beforeinversion. This makes the problem nonsingular, even if XT X is not of fullrank, and was the main motivation for ridge regression when it was firstintroduced in statistics (Hoerl and Kennard, 1970). Traditional descriptionsof ridge regression start with definition (3.44). We choose to motivate it via(3.41) and (3.42), as these provide insight into how it works.Figure 3.8 shows the ridge coefficient estimates for the prostate cancer example, plotted as functions of df(λ), the effective degrees of freedomimplied by the penalty λ (defined in (3.50) on page 68).

In the case of orthonormal inputs, the ridge estimates are just a scaled version of the leastsquares estimates, that is, β̂ ridge = β̂/(1 + λ).Ridge regression can also be derived as the mean or mode of a posterior distribution, with a suitably chosen prior distribution. In detail, suppose yi ∼ N (β0 + xTi β, σ 2 ), and the parameters βj are each distributed asN (0, τ 2 ), independently of one another. Then the (negative) log-posteriordensity of β, with τ 2 and σ 2 assumed known, is equal to the expressionin curly braces in (3.41), with λ = σ 2 /τ 2 (Exercise 3.6). Thus the ridgeestimate is the mode of the posterior distribution; since the distribution isGaussian, it is also the posterior mean.The singular value decomposition (SVD) of the centered input matrix Xgives us some additional insight into the nature of ridge regression.

This decomposition is extremely useful in the analysis of many statistical methods.The SVD of the N × p matrix X has the formX = UDVT .(3.45)3.4 Shrinkage Methods650.6• lcavol•••••••••0.4••••0.2••••••• ••• ••• • ••• ••• • •• •• ••• •• • •• •• • • •• ••••• •• •• •• •• •• • •• •• • •• • •• •• • ••••• •• •• •• • •• • •• •••• • • • • •• •• •• • • • • •• • • •••• ••• •• • •• ••• ••−0.20.0Coefficients••• • ••• svi• lweight• pgg45••• • ••• lbph• • ••• •••••••• gleason• age• lcp02468df(λ)FIGURE 3.8. Profiles of ridge coefficients for the prostate cancer example, asthe tuning parameter λ is varied.

Coefficients are plotted versus df(λ), the effectivedegrees of freedom. A vertical line is drawn at df = 5.0, the value chosen bycross-validation.663. Linear Methods for RegressionHere U and V are N × p and p × p orthogonal matrices, with the columnsof U spanning the column space of X, and the columns of V spanning therow space. D is a p × p diagonal matrix, with diagonal entries d1 ≥ d2 ≥· · · ≥ dp ≥ 0 called the singular values of X. If one or more values dj = 0,X is singular.Using the singular value decomposition we can write the least squaresfitted vector asXβ̂ ls= X(XT X)−1 XT y= UUT y,(3.46)after some simplification.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.