The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 83

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 83 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 832020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 83)

The overall error rate is 42.5%, which can becompared to the null rate of 69% obtained by predicting the most numerous38010. Boosting and Additive TreesFIGURE 10.21. Geological prediction maps of the presence probability (leftmap) and catch size (right map) obtained from the gradient boosted models.class Prof/Man (Professional/Managerial). The four best predicted classesare seen to be Retired, Student, Prof/Man, and Homemaker.Figure 10.23 shows the relative predictor variable importances as averaged over all classes (10.46). Figure 10.24 displays the individual relativeimportance distributions (10.45) for each of the four best predicted classes.One sees that the most relevant predictors are generally different for eachrespective class. An exception is age which is among the three most relevantfor predicting Retired, Student, and Prof/Man.Figure 10.25 shows the partial dependence of the log-odds (10.52) on agefor these three classes.

The abscissa values are ordered codes for respectiveequally spaced age intervals. One sees that after accounting for the contributions of the other variables, the odds of being retired are higher for olderpeople, whereas the opposite is the case for being a student. The odds ofbeing professional/managerial are highest for middle-aged people. Theseresults are of course not surprising. They illustrate that inspecting partialdependences separately for each class can lead to sensible results.Bibliographic NotesSchapire (1990) developed the first simple boosting procedure in the PAClearning framework (Valiant, 1984; Kearns and Vazirani, 1994). Schapire10.14 Illustrations381Overall Error Rate = 0.425StudentRetiredProf/ManHomemakerLaborClericalMilitaryUnemployedSales0.00.20.40.60.81.0Error RateFIGURE 10.22.

Error rate for each occupation in the demographics data.yrs-BAchildrennum-hsldlangtyp-homemar-statethnicsexmar-dlinchsld-stateduincomeage020406080100Relative ImportanceFIGURE 10.23. Relative importance of the predictors as averaged over allclasses for the demographics data.38210. Boosting and Additive TreesClass = RetiredClass = Studentyrs-BAnum-hsldeduchildrentyp-homelangmar-stathsld-statincomeethnicsexmar-dlincagechildrenyrs-BAlangmar-dlincsextyp-homenum-hsldethnicedumar-statincomeagehsld-stat020406080100020406080Relative ImportanceRelative ImportanceClass = Prof/ManClass = Homemakerchildrenyrs-BAmar-statlangnum-hsldsextyp-homehsld-statethnicmar-dlincageincomeedu100yrs-BAhsld-statageincometyp-homelangmar-statedunum-hsldethnicchildrenmar-dlincsex020406080Relative Importance100020406080100Relative ImportanceFIGURE 10.24. Predictor variable importances separately for each of the fourclasses with lowest error rate for the demographics data.10.14 IllustrationsStudent10-1Partial Dependence321-20Partial Dependence24Retired3831234567age1234567age10-1-2Partial Dependence2Prof/Man1234567ageFIGURE 10.25.

Partial dependence of the odds of three different occupationson age, for the demographics data.showed that a weak learner could always improve its performance by training two additional classifiers on filtered versions of the input data stream.A weak learner is an algorithm for producing a two-class classifier withperformance guaranteed (with high probability) to be significantly betterthan a coin-flip.

After learning an initial classifier G1 on the first N trainingpoints,• G2 is learned on a new sample of N points, half of which are misclassified by G1 ;• G3 is learned on N points for which G1 and G2 disagree;• the boosted classifier is GB = majority vote(G1 , G2 , G3 ).Schapire’s “Strength of Weak Learnability” theorem proves that GB hasimproved performance over G1 .Freund (1995) proposed a “boost by majority” variation which combinedmany weak learners simultaneously and improved the performance of thesimple boosting algorithm of Schapire. The theory supporting both of these38410.

Boosting and Additive Treesalgorithms requires the weak learner to produce a classifier with a fixederror rate. This led to the more adaptive and realistic AdaBoost (Freundand Schapire, 1996a) and its offspring, where this assumption was dropped.Freund and Schapire (1996a) and Schapire and Singer (1999) providesome theory to support their algorithms, in the form of upper bounds ongeneralization error. This theory has evolved in the computational learningcommunity, initially based on the concepts of PAC learning.

Other theories attempting to explain boosting come from game theory (Freund andSchapire, 1996b; Breiman, 1999; Breiman, 1998), and VC theory (Schapireet al., 1998). The bounds and the theory associated with the AdaBoostalgorithms are interesting, but tend to be too loose to be of practical importance. In practice, boosting achieves results far more impressive thanthe bounds would imply. Schapire (2002) and Meir and Rätsch (2003) giveuseful overviews more recent than the first edition of this book.Friedman et al. (2000) and Friedman (2001) form the basis for our exposition in this chapter.

Friedman et al. (2000) analyze AdaBoost statistically,derive the exponential criterion, and show that it estimates the log-oddsof the class probability. They propose additive tree models, the right-sizedtrees and ANOVA representation of Section 10.11, and the multiclass logitformulation. Friedman (2001) developed gradient boosting and shrinkagefor classification and regression, while Friedman (1999) explored stochasticvariants of boosting. Mason et al. (2000) also embraced a gradient approachto boosting. As the published discussions of Friedman et al. (2000) shows,there is some controversy about how and why boosting works.Since the publication of the first edition of this book, these debates havecontinued, and spread into the statistical community with a series of paperson consistency of boosting (Jiang, 2004; Lugosi and Vayatis, 2004; Zhangand Yu, 2005; Bartlett and Traskin, 2007).

Mease and Wyner (2008),through a series of simulation examples, challenge some of our interpretations of boosting; our response (Friedman et al., 2008a) puts most ofthese objections to rest. A recent survey by Bühlmann and Hothorn (2007)supports our approach to boosting.ExercisesEx. 10.1 Derive expression (10.12) for the update parameter in AdaBoost.Ex. 10.2 Prove result (10.16), that is, the minimizer of the populationversion of the AdaBoost criterion, is one-half of the log odds.Ex. 10.3 Show that the marginal average (10.47) recovers additive andmultiplicative functions (10.50) and (10.51), while the conditional expectation (10.49) does not.Exercises385Ex. 10.4(a) Write a program implementing AdaBoost with trees.(b) Redo the computations for the example of Figure 10.2.

Plot the training error as well as test error, and discuss its behavior.(c) Investigate the number of iterations needed to make the test errorfinally start to rise.(d) Change the setup of this example as follows: define two classes, withthe features in Class 1 being X1 , X2 , . .

. , X10 , standard independent Gaussian variates. In Class 2, the features X1 , X2 , . . . , X10 arealsoindependent Gaussian, but conditioned on the eventP standard2X>12.Nowthe classes have significant overlap in feature space.jjRepeat the AdaBoost experiments as in Figure 10.2 and discuss theresults.Ex. 10.5 Multiclass exponential loss (Zhu et al., 2005). For a K-class classification problem, consider the coding Y = (Y1 , . . . , YK )T with1,if G = Gk(10.55)Yk =1, otherwise.− K−1Let f = (f1 , .

. . , fK )T withPKfk = 0, and define1 TL(Y, f ) = exp − Y f .Kk=1(10.56)(a) Using Lagrange multipliers, derive the population minimizer f ∗ ofE(Y, f ), subject to the zero-sum constraint, and relate these to theclass probabilities.(b) Show that a multiclass boosting using this loss function leads to areweighting algorithm similar to Adaboost, as in Section 10.4.Ex. 10.6 McNemar test (Agresti, 1996). We report the test error rates onthe spam data to be 5.5% for a generalized additive model (GAM), and4.5% for gradient boosting (GBM), with a test sample of size 1536.(a) Show that the standard error of these estimates is about 0.6%.Since the same test data are used for both methods, the error rates arecorrelated, and we cannot perform a two-sample t-test. We can comparethe methods directly on each test observation, leading to the summaryGAMCorrectErrorGBMCorrect Error143418335138610.

Boosting and Additive TreesThe McNemar test focuses on the discordant errors, 33 vs. 18.(b) Conduct a test to show that GAM makes significantly more errorsthan gradient boosting, with a two-sided p-value of 0.036.Ex. 10.7 Derive expression (10.32).Ex. 10.8 Consider a K-class problem where the targets yik are coded as1 if observation i is in class k and zero Potherwise. Suppose we have aKcurrent model fk (x), k = 1, .

. . , K, with k=1 fk (x) = 0 (see (10.21) inSection 10.6). We wish to update the model for observations in a region Rin predictor space, by adding constants fk (x) + γk , with γK = 0.(a) Write down the multinomial log-likelihood for this problem, and itsfirst and second derivatives.(b) Using only the diagonal of the Hessian matrix in (1), and startingfrom γk = 0 ∀k, show that a one-step approximate Newton updatefor γk isP(yik − pik )1γk = P xi ∈R, k = 1, .

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.