Главная » Просмотр файлов » The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 77

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 77 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 772020-08-25СтудИзба
Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Текст из файла (страница 77)

However, on finite samples squared-error loss places much moreemphasis on observations with large absolute residuals | yi − f (xi ) | duringthe fitting process. It is thus far less robust, and its performance severelydegrades for long-tailed error distributions and especially for grossly mismeasured y-values (“outliers”).

Other more robust criteria, such as absolute loss, perform much better in these situations. In the statistical robustness literature, a variety of regression loss criteria have been proposedthat provide strong resistance (if not absolute immunity) to gross outlierswhile being nearly as efficient as least squares for Gaussian errors. Theyare often better than either for error distributions with moderately heavytails. One such criterion is the Huber loss criterion used for M-regression(Huber, 1964)[y − f (x)]2for | y − f (x) | ≤ δ,L(y, f (x)) =(10.23)2δ| y − f (x) | − δ 2 otherwise.Figure 10.5 compares these three loss functions.These considerations suggest that when robustness is a concern, as isespecially the case in data mining applications (see Section 10.7), squarederror loss for regression and exponential loss for classification are not thebest criteria from a statistical perspective.

However, they both lead to theelegant modular boosting algorithms in the context of forward stagewiseadditive modeling. For squared-error loss one simply fits the base learnerto the residuals from the current model yi − fm−1 (xi ) at each step. For10. Boosting and Additive TreesSquared ErrorAbsolute ErrorHuber024Loss68350−3−2−10123y−fFIGURE 10.5. A comparison of three loss functions for regression, plotted as afunction of the margin y−f . The Huber loss function combines the good propertiesof squared-error loss near zero and absolute error loss when |y − f | is large.exponential loss one performs a weighted fit of the base learner to theoutput values yi , with weights wi = exp(−yi fm−1 (xi )).

Using other morerobust criteria directly in their place does not give rise to such simplefeasible boosting algorithms. However, in Section 10.10.2 we show how onecan derive simple elegant boosting algorithms based on any differentiableloss criterion, thereby producing highly robust boosting procedures for datamining.10.7 “Off-the-Shelf” Procedures for Data MiningPredictive learning is an important aspect of data mining.

As can be seenfrom this book, a wide variety of methods have been developed for predictive learning from data. For each particular method there are situationsfor which it is particularly well suited, and others where it performs badlycompared to the best that can be done with that data. We have attemptedto characterize appropriate situations in our discussions of each of the respective methods. However, it is seldom known in advance which procedurewill perform best or even well for any given problem.

Table 10.1 summarizessome of the characteristics of a number of learning methods.Industrial and commercial data mining applications tend to be especiallychallenging in terms of the requirements placed on learning procedures.Data sets are often very large in terms of number of observations andnumber of variables measured on each of them. Thus, computational con-10.7 “Off-the-Shelf” Procedures for Data Mining351TABLE 10.1.

Some characteristics of different learning methods. Key: ▲= good,◆=fair, and ▼=poor.CharacteristicNeuralSVMTreesMARSNetsk-NN,KernelsNatural handling of dataof “mixed” type▼▼▲▲▼Handling of missing values▼▼▲▲▲Robustness to outliers ininput space▼▼▲▼▲Insensitive to monotonetransformations of inputs▼▼▲▼▼Computational scalability(large N )▼▼▲▲▼Ability to deal with irrelevant inputs▼▼▲▲▼Ability to extract linearcombinations of features▲▲▼▼◆Interpretability▼▼◆▲▼Predictive power▲▲▼◆▲siderations play an important role. Also, the data are usually messy: theinputs tend to be mixtures of quantitative, binary, and categorical variables, the latter often with many levels.

There are generally many missingvalues, complete observations being rare. Distributions of numeric predictor and response variables are often long-tailed and highly skewed. Thisis the case for the spam data (Section 9.1.2); when fitting a generalizedadditive model, we first log-transformed each of the predictors in order toget a reasonable fit.

In addition they usually contain a substantial fractionof gross mis-measurements (outliers). The predictor variables are generallymeasured on very different scales.In data mining applications, usually only a small fraction of the largenumber of predictor variables that have been included in the analysis areactually relevant to prediction. Also, unlike many applications such as pattern recognition, there is seldom reliable domain knowledge to help createespecially relevant features and/or filter out the irrelevant ones, the inclusion of which dramatically degrades the performance of many methods.In addition, data mining applications generally require interpretable models.

It is not enough to simply produce predictions. It is also desirable tohave information providing qualitative understanding of the relationship35210. Boosting and Additive Treesbetween joint values of the input variables and the resulting predicted response value. Thus, black box methods such as neural networks, which canbe quite useful in purely predictive settings such as pattern recognition,are far less useful for data mining.These requirements of speed, interpretability and the messy nature ofthe data sharply limit the usefulness of most learning procedures as offthe-shelf methods for data mining.

An “off-the-shelf” method is one thatcan be directly applied to the data without requiring a great deal of timeconsuming data preprocessing or careful tuning of the learning procedure.Of all the well-known learning methods, decision trees come closest tomeeting the requirements for serving as an off-the-shelf procedure for datamining. They are relatively fast to construct and they produce interpretablemodels (if the trees are small). As discussed in Section 9.2, they naturallyincorporate mixtures of numeric and categorical predictor variables andmissing values.

They are invariant under (strictly monotone) transformations of the individual predictors. As a result, scaling and/or more generaltransformations are not an issue, and they are immune to the effects of predictor outliers. They perform internal feature selection as an integral partof the procedure.

They are thereby resistant, if not completely immune,to the inclusion of many irrelevant predictor variables. These properties ofdecision trees are largely the reason that they have emerged as the mostpopular learning method for data mining.Trees have one aspect that prevents them from being the ideal tool forpredictive learning, namely inaccuracy. They seldom provide predictive accuracy comparable to the best that can be achieved with the data at hand.As seen in Section 10.1, boosting decision trees improves their accuracy,often dramatically. At the same time it maintains most of their desirableproperties for data mining.

Some advantages of trees that are sacrificed byboosting are speed, interpretability, and, for AdaBoost, robustness againstoverlapping class distributions and especially mislabeling of the trainingdata. A gradient boosted model (GBM) is a generalization of tree boostingthat attempts to mitigate these problems, so as to produce an accurate andeffective off-the-shelf procedure for data mining.10.8 Example: Spam DataBefore we go into the details of gradient boosting, we demonstrate its abilities on a two-class classification problem.

The spam data are introduced inChapter 1, and used as an example for many of the procedures in Chapter 9(Sections 9.1.2, 9.2.5, 9.3.1 and 9.4.1).Applying gradient boosting to these data resulted in a test error rate of4.5%, using the same test set as was used in Section 9.1.2. By comparison,an additive logistic regression achieved 5.5%, a CART tree fully grown and10.9 Boosting Trees353pruned by cross-validation 8.7%, and MARS 5.5%.

The standard error ofthese estimates is around 0.6%, although gradient boosting is significantlybetter than all of them using the McNemar test (Exercise 10.6).In Section 10.13 below we develop a relative importance measure foreach predictor, as well as a partial dependence plot describing a predictor’scontribution to the fitted model. We now illustrate these for the spam data.Figure 10.6 displays the relative importance spectrum for all 57 predictorvariables.

Clearly some predictors are more important than others in separating spam from email. The frequencies of the character strings !, $, hp,and remove are estimated to be the four most relevant predictor variables.At the other end of the spectrum, the character strings 857, 415, table, and3d have virtually no relevance.The quantity being modeled here is the log-odds of spam versus emailf (x) = logPr(spam|x)Pr(email|x)(10.24)(see Section 10.13 below).

Figure 10.7 shows the partial dependence of thelog-odds on selected important predictors, two positively associated withspam (! and remove), and two negatively associated (edu and hp). Theseparticular dependencies are seen to be essentially monotonic. There is ageneral agreement with the corresponding functions found by the additivelogistic regression model; see Figure 9.1 on page 303.Running a gradient boosted model on these data with J = 2 terminalnode trees produces a purely additive (main effects) model for the logodds, with a corresponding error rate of 4.7%, as compared to 4.5% for thefull gradient boosted model (with J = 5 terminal-node trees).

Althoughnot significant, this slightly higher error rate suggests that there may beinteractions among some of the important predictor variables. This canbe diagnosed through two-variable partial dependence plots. Figure 10.8shows one of the several such plots displaying strong interaction effects.One sees that for very low frequencies of hp, the log-odds of spam aregreatly increased. For high frequencies of hp, the log-odds of spam tend tobe much lower and roughly constant as a function of !. As the frequencyof hp decreases, the functional relationship with ! strengthens.10.9 Boosting TreesRegression and classification trees are discussed in detail in Section 9.2.They partition the space of all joint predictor variable values into disjointregions Rj , j = 1, 2, .

. . , J, as represented by the terminal nodes of the tree.A constant γj is assigned to each such region and the predictive rule isx ∈ Rj ⇒ f (x) = γj .35410. Boosting and Additive Trees3daddresseslabstelnet857415directcstable85#partscredit[labconferencereportoriginaldataprojectfontmakeaddressorderallhpltechnologypeoplepmmailover650;meetingemail000internetreceive(rebusiness1999willmoneyouryoueduCAPTOTgeorgeCAPMAXyourCAPAVEfreeremovehp$!020406080100Relative ImportanceFIGURE 10.6. Predictor variable importance spectrum for the spam data.

Характеристики

Список файлов книги

Свежие статьи
Популярно сейчас
А знаете ли Вы, что из года в год задания практически не меняются? Математика, преподаваемая в учебных заведениях, никак не менялась минимум 30 лет. Найдите нужный учебный материал на СтудИзбе!
Ответы на популярные вопросы
Да! Наши авторы собирают и выкладывают те работы, которые сдаются в Вашем учебном заведении ежегодно и уже проверены преподавателями.
Да! У нас любой человек может выложить любую учебную работу и зарабатывать на её продажах! Но каждый учебный материал публикуется только после тщательной проверки администрацией.
Вернём деньги! А если быть более точными, то автору даётся немного времени на исправление, а если не исправит или выйдет время, то вернём деньги в полном объёме!
Да! На равне с готовыми студенческими работами у нас продаются услуги. Цены на услуги видны сразу, то есть Вам нужно только указать параметры и сразу можно оплачивать.
Отзывы студентов
Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.
Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.
Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.
Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.
Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.
Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.
Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.
Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.
Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.
Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.
Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.
Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.
Популярные преподаватели
Добавляйте материалы
и зарабатывайте!
Продажи идут автоматически
6358
Авторов
на СтудИзбе
311
Средний доход
с одного платного файла
Обучение Подробнее