The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 59

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 59 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 592020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 59)

In contrast, cross-validation overestimatedthe error by 1%, 4%, 0%, and 4%, with the bootstrap doing about thesame. Hence the extra work involved in computing a cross-validation orbootstrap measure is worthwhile, if an accurate estimate of test error isrequired. With other fitting methods like trees, cross-validation and bootstrap can underestimate the true error by 10%, because the search for besttree is strongly affected by the validation set. In these situations only aseparate test set will provide an unbiased estimate of test error.7.12 Conditional or Expected Test Error?Figures 7.14 and 7.15 examine the question of whether cross-validation doesa good job in estimating ErrT , the error conditional on a given training setT (expression (7.15) on page 228), as opposed to the expected test error.For each of 100 training sets generated from the “reg/linear” setting inthe top-right panel of Figure 7.3, Figure 7.14 shows the conditional errorcurves ErrT as a function of subset size (top left).

The next two panels show10-fold and N -fold cross-validation, the latter also known as leave-one-out(LOO). The thick red curve in each plot is the expected error Err, whilethe thick black curves are the expected cross-validation curves.

The lowerright panel shows how well cross-validation approximates the conditionaland expected error.One might have expected N -fold CV to approximate ErrT well, since italmost uses the full training sample to fit a new test point. 10-fold CV, onthe other hand, might be expected to estimate Err well, since it averagesover somewhat different training sets. From the figure it appears 10-folddoes a better job than N -fold in estimating ErrT , and estimates Err evenbetter. Indeed, the similarity of the two black curves with the red curvesuggests both CV curves are approximately unbiased for Err, with 10-foldhaving less variance.

Similar trends were reported by Efron (1983).Figure 7.15 shows scatterplots of both 10-fold and N -fold cross-validationerror estimates versus the true conditional error for the 100 simulations.Although the scatterplots do not indicate much correlation, the lower rightpanel shows that for the most part the correlations are negative, a curious phenomenon that has been observed before. This negative correlationexplains why neither form of CV estimates ErrT well.

The broken lines ineach plot are drawn at Err(p), the expected error for the best subset ofsize p. We see again that both forms of CV are approximately unbiased forexpected error, but the variation in test error for different training sets isquite substantial.Among the four experimental conditions in 7.3, this “reg/linear” scenarioshowed the highest correlation between actual and predicted test error. This7.12 Conditional or Expected Test Error?0.3Error0.20.10.10.2Error0.30.410−Fold CV Error0.4Prediction Error255510152051015Leave−One−Out CV ErrorApproximation Error0.035ET |CV10 −Err|ET |CV10 −ErrT |ET |CVN −ErrT |0.0150.025Mean Absolute Deviation0.3Error0.20.10.045Subset Size p0.4Subset Size p20510Subset Size p15205101520Subset Size pFIGURE 7.14.

Conditional prediction-error ErrT , 10-fold cross-validation, andleave-one-out cross-validation curves for a 100 simulations from the top-rightpanel in Figure 7.3. The thick red curve is the expected prediction error Err,while the thick black curves are the expected CV curves ET CV10 and ET CVN .The lower-right panel shows the mean absolute deviation of the CV curves fromthe conditional error, ET |CVK − ErrT | for K = 10 (blue) and K = N (green),as well as from the expected error ET |CV10 − Err| (orange).2567. Model Assessment and Selection0.30CV Error0.200.100.200.10CV Error0.300.40Subset Size 50.40Subset Size 10.100.150.200.250.300.350.400.100.15Prediction Error0.200.250.300.350.40Prediction ErrorCorrelation−0.2Leave−one−out10−Fold−0.60.10−0.40.20CV Error0.300.00.400.2Subset Size 100.100.150.200.250.30Prediction Error0.350.405101520Subset SizeFIGURE 7.15.

Plots of the CV estimates of error versus the true conditionalerror for each of the 100 training sets, for the simulation setup in the top rightpanel Figure 7.3. Both 10-fold and leave-one-out CV are depicted in differentcolors. The first three panels correspond to different subset sizes p, and verticaland horizontal lines are drawn at Err(p). Although there appears to be little correlation in these plots, we see in the lower right panel that for the most part thecorrelation is negative.Exercises257phenomenon also occurs for bootstrap estimates of error, and we wouldguess, for any other estimate of conditional prediction error.We conclude that estimation of test error for a particular training set isnot easy in general, given just the data from that same training set. Instead,cross-validation and related methods may provide reasonable estimates ofthe expected error Err.Bibliographic NotesKey references for cross-validation are Stone (1974), Stone (1977) andAllen (1974).

The AIC was proposed by Akaike (1973), while the BICwas introduced by Schwarz (1978). Madigan and Raftery (1994) give anoverview of Bayesian model selection. The MDL criterion is due to Rissanen (1983). Cover and Thomas (1991) contains a good description of codingtheory and complexity. VC dimension is described in Vapnik (1996). Stone(1977) showed that the AIC and leave-one out cross-validation are asymptotically equivalent.

Generalized cross-validation is described by Golub etal. (1979) and Wahba (1980); a further discussion of the topic may be foundin the monograph by Wahba (1990). See also Hastie and Tibshirani (1990),Chapter 3. The bootstrap is due to Efron (1979); see Efron and Tibshirani (1993) for an overview. Efron (1983) proposes a number of bootstrapestimates of prediction error, including the optimism and .632 estimates.Efron (1986) compares CV, GCV and bootstrap estimates of error rates.The use of cross-validation and the bootstrap for model selection is studied by Breiman and Spector (1992), Breiman (1992), Shao (1996), Zhang(1993) and Kohavi (1995).

The .632+ estimator was proposed by Efronand Tibshirani (1997).Cherkassky and Ma (2003) published a study on the performance ofSRM for model selection in regression, in response to our study of section7.9.1. They complained that we had been unfair to SRM because had notapplied it properly. Our response can be found in the same issue of thejournal (Hastie et al. (2003)).ExercisesEx.

7.1 Derive the estimate of in-sample error (7.24).Ex. 7.2 For 0–1 loss with Y ∈ {0, 1} and Pr(Y = 1|x0 ) = f (x0 ), show thatErr(x0 )==Pr(Y 6= Ĝ(x0 )|X = x0 )ErrB (x0 ) + |2f (x0 ) − 1|Pr(Ĝ(x0 ) 6= G(x0 )|X = x0 ),(7.62)2587. Model Assessment and Selectionwhere Ĝ(x) = I(fˆ(x) > 12 ), G(x) = I(f (x) > 21 ) is the Bayes classifier,and ErrB (x0 ) = Pr(Y 6= G(x0 )|X = x0 ), the irreducible Bayes error at x0 .Using the approximation fˆ(x0 ) ∼ N (Efˆ(x0 ), Var(fˆ(x0 )), show that!sign( 12 − f (x0 ))(Efˆ(x0 ) − 21 )qPr(Ĝ(x0 ) 6= G(x0 )|X = x0 ) ≈ Φ.

(7.63)Var(fˆ(x0 ))In the above,1Φ(t) = √2πZtexp(−t2 /2)dt,−∞the cumulative Gaussian distribution function. This is an increasing function, with value 0 at t = −∞ and value 1 at t = +∞.We can think of sign( 12 − f (x0 ))(Efˆ(x0 ) − 21 ) as a kind of boundarybias term, as it depends on the true f (x0 ) only through which side of theboundary ( 21 ) that it lies.

Notice also that the bias and variance combinein a multiplicative rather than additive fashion. If Efˆ(x0 ) is on the sameside of 21 as f (x0 ), then the bias is negative, and decreasing the variancewill decrease the misclassification error. On the other hand, if Efˆ(x0 ) ison the opposite side of 12 to f (x0 ), then the bias is positive and it pays toincrease the variance! Such an increase will improve the chance that fˆ(x0 )falls on the correct side of 12 (Friedman, 1997).Ex.

7.3 Let f̂ = Sy be a linear smoothing of y.(a) If Sii is the ith diagonal element of S, show that for S arising from leastsquares projections and cubic smoothing splines, the cross-validatedresidual can be written asyi − fˆ−i (xi ) =yi − fˆ(xi ).1 − Sii(7.64)(b) Use this result to show that |yi − fˆ−i (xi )| ≥ |yi − fˆ(xi )|.(c) Find general conditions on any smoother S to make result (7.64) hold.Ex. 7.4 Consider the in-sample prediction error (7.18) and the trainingerror err in the case of squared-error loss:Errinerr==N1 XEY 0 (Yi0 − fˆ(xi ))2N i=1N1 X(yi − fˆ(xi ))2 .N i=1Exercises259Add and subtract f (xi ) and Efˆ(xi ) in each expression and expand. Henceestablish that the average optimism in the training error isN2 XCov(ŷi , yi ),N i=1as given in (7.21).Ex. 7.5 For a linear smoother ŷ = Sy, show thatNXCov(ŷi , yi ) = trace(S)σε2 ,(7.65)i=1which justifies its use as the effective number of parameters.Ex.

7.6 Show that for an additive-error model, the effective degrees-offreedom for the k-nearest-neighbors regression fit is N/k.Ex. 7.7 Use the approximation 1/(1−x)2 ≈ 1+2x to expose the relationshipbetween Cp /AIC (7.26) and GCV (7.52), the main difference being themodel used to estimate the noise variance σε2 .Ex. 7.8 Show that the set of functions {I(sin(αx) > 0)} can shatter thefollowing points on the line:z 1 = 10−1 , . . . , z ℓ = 10−ℓ ,(7.66)for any ℓ. Hence the VC dimension of the class {I(sin(αx) > 0)} is infinite.Ex.

7.9 For the prostate data of Chapter 3, carry out a best-subset linearregression analysis, as in Table 3.3 (third column from left). Compute theAIC, BIC, five- and tenfold cross-validation, and bootstrap .632 estimatesof prediction error. Discuss the results.Ex. 7.10 Referring to the example in Section 7.10.3, suppose instead thatall of the p predictors are binary, and hence there is no need to estimatesplit points. The predictors are independent of the class labels as before.Then if p is very large, we can probably find a predictor that splits theentire training data perfectly, and hence would split the validation data(one-fifth of data) perfectly as well. This predictor would therefore havezero cross-validation error.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.