The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 78

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 78 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 782020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 78)

Thevariable names are written on the vertical axis.Partial Dependence0.00.20.40.60.81.00.00.20.40.6-1.0-0.6Partial Dependence-0.6-1.0-0.2 0.0 0.2remove-0.2 0.0 0.2!Partial Dependence355-0.2 0.0 0.2 0.4 0.6 0.8 1.0Partial Dependence-0.2 0.0 0.2 0.4 0.6 0.8 1.010.9 Boosting Trees0.00.20.40.60.81.00.00.51.0edu1.52.02.53.0hpFIGURE 10.7. Partial dependence of log-odds of spam on four important predictors. The red ticks at the base of the plots are deciles of the input variable.1.00.50.0-0.5-1.01.00.80.6!0.40.23.02.52.01.51.00.5hpFIGURE 10.8.

Partial dependence of the log-odds of spam vs. email as a function of joint frequencies of hp and the character !.35610. Boosting and Additive TreesThus a tree can be formally expressed asJXT (x; Θ) =j=1γj I(x ∈ Rj ),(10.25)with parameters Θ = {Rj , γj }J1 . J is usually treated as a meta-parameter.The parameters are found by minimizing the empirical riskΘ̂ = arg minΘJXXL(yi , γj ).(10.26)j=1 xi ∈RjThis is a formidable combinatorial optimization problem, and we usuallysettle for approximate suboptimal solutions.

It is useful to divide the optimization problem into two parts:Finding γj given Rj : Given the Rj , estimating the γj is typically trivial,and often γ̂j = ȳj , the mean of the yi falling in region Rj . For misclassification loss, γ̂j is the modal class of the observations falling inregion Rj .Finding Rj : This is the difficult part, for which approximate solutions arefound. Note also that finding the Rj entails estimating the γj as well.A typical strategy is to use a greedy, top-down recursive partitioningalgorithm to find the Rj .

In addition, it is sometimes necessary toapproximate (10.26) by a smoother and more convenient criterion foroptimizing the Rj :Θ̃ = arg minΘNXL̃(yi , T (xi , Θ)).(10.27)i=1Then given the R̂j = R̃j , the γj can be estimated more preciselyusing the original criterion.In Section 9.2 we described such a strategy for classification trees. The Giniindex replaced misclassification loss in the growing of the tree (identifyingthe Rj ).The boosted tree model is a sum of such trees,fM (x) =MXT (x; Θm ),(10.28)m=1induced in a forward stagewise manner (Algorithm 10.2). At each step inthe forward stagewise procedure one must solveΘ̂m = arg minΘmNXi=1L (yi , fm−1 (xi ) + T (xi ; Θm ))(10.29)10.9 Boosting Trees357for the region set and constants Θm = {Rjm , γjm }J1 m of the next tree, giventhe current model fm−1 (x).Given the regions Rjm , finding the optimal constants γjm in each regionis typically straightforward:XL (yi , fm−1 (xi ) + γjm ) .(10.30)γ̂jm = arg minγjmxi ∈RjmFinding the regions is difficult, and even more difficult than for a singletree.

For a few special cases, the problem simplifies.For squared-error loss, the solution to (10.29) is no harder than for asingle tree. It is simply the regression tree that best predicts the currentresiduals yi − fm−1 (xi ), and γ̂jm is the mean of these residuals in eachcorresponding region.For two-class classification and exponential loss, this stagewise approachgives rise to the AdaBoost method for boosting classification trees (Algorithm 10.1). In particular, if the trees T (x; Θm ) are restricted to be scaledclassification trees, then we showed in Section 10.4 that the solution toPN(m)(10.29) is the tree that minimizes the weighted error rate i=1 wi I(yi 6=(m)= e−yi fm−1 (xi ) . By a scaled classificationT (xi ; Θm )) with weights witree, we mean βm T (x; Θm ), with the restriction that γjm ∈ {−1, 1}).Without this restriction, (10.29) still simplifies for exponential loss to aweighted exponential criterion for the new tree:Θ̂m = arg minΘmNX(m)wiexp[−yi T (xi ; Θm )].(10.31)i=1It is straightforward to implement a greedy recursive-partitioning algorithmusing this weighted exponential loss as a splitting criterion.

Given the Rjm ,one can show (Exercise 10.7) that the solution to (10.30) is the weightedlog-odds in each corresponding regionP(m)I(yi = 1)1xi ∈Rjm wiγ̂jm = log P.(10.32)(m)2I(yi = −1)xi ∈Rjm wiThis requires a specialized tree-growing algorithm; in practice, we preferthe approximation presented below that uses a weighted least squares regression tree.Using loss criteria such as the absolute error or the Huber loss (10.23) inplace of squared-error loss for regression, and the deviance (10.22) in placeof exponential loss for classification, will serve to robustify boosting trees.Unfortunately, unlike their nonrobust counterparts, these robust criteriado not give rise to simple fast boosting algorithms.For more general loss criteria the solution to (10.30), given the Rjm ,is typically straightforward since it is a simple “location” estimate. For35810. Boosting and Additive Treesabsolute loss it is just the median of the residuals in each respective region.For the other criteria fast iterative algorithms exist for solving (10.30),and usually their faster “single-step” approximations are adequate.

Theproblem is tree induction. Simple fast algorithms do not exist for solving(10.29) for these more general loss criteria, and approximations like (10.27)become essential.10.10 Numerical Optimization via GradientBoostingFast approximate algorithms for solving (10.29) with any differentiable losscriterion can be derived by analogy to numerical optimization. The loss inusing f (x) to predict y on the training data isL(f ) =NXL(yi , f (xi )).(10.33)i=1The goal is to minimize L(f ) with respect to f , where here f (x) is constrained to be a sum of trees (10.28).

Ignoring this constraint, minimizing(10.33) can be viewed as a numerical optimizationf̂ = arg min L(f ),f(10.34)where the “parameters” f ∈ IRN are the values of the approximating function f (xi ) at each of the N data points xi :f = {f (x1 ), f (x2 ), . . . , f (xN )}T .Numerical optimization procedures solve (10.34) as a sum of componentvectorsMXhm , hm ∈ IRN ,fM =m=0where f0 = h0 is an initial guess, and each successive fm is induced basedon the current parameter vector fm−1 , which is the sum of the previouslyinduced updates. Numerical optimization methods differ in their prescriptions for computing each increment vector hm (“step”).10.10.1 Steepest DescentSteepest descent chooses hm = −ρm gm where ρm is a scalar and gm ∈ IRNis the gradient of L(f ) evaluated at f = fm−1 .

The components of thegradient gm are∂L(yi , f (xi ))(10.35)gim =∂f (xi )f (xi )=fm−1 (xi )10.10 Numerical Optimization via Gradient Boosting359The step length ρm is the solution toρm = arg min L(fm−1 − ρgm ).(10.36)ρThe current solution is then updatedfm = fm−1 − ρm gmand the process repeated at the next iteration. Steepest descent can beviewed as a very greedy strategy, since −gm is the local direction in IRNfor which L(f ) is most rapidly decreasing at f = fm−1 .10.10.2 Gradient BoostingForward stagewise boosting (Algorithm 10.2) is also a very greedy strategy.At each step the solution tree is the one that maximally reduces (10.29),given the current model fm−1 and its fits fm−1 (xi ). Thus, the tree predictions T (xi ; Θm ) are analogous to the components of the negative gradient(10.35).

The principal difference between them is that the tree componentstm = {T (x1 ; Θm ), . . . , T (xN ; Θm )}T are not independent. They are constrained to be the predictions of a Jm -terminal node decision tree, whereasthe negative gradient is the unconstrained maximal descent direction.The solution to (10.30) in the stagewise approach is analogous to the linesearch (10.36) in steepest descent.

The difference is that (10.30) performsa separate line search for those components of tm that correspond to eachseparate terminal region {T (xi ; Θm )}xi ∈Rjm .If minimizing loss on the training data (10.33) were the only goal, steepest descent would be the preferred strategy. The gradient (10.35) is trivialto calculate for any differentiable loss function L(y, f (x)), whereas solving(10.29) is difficult for the robust criteria discussed in Section 10.6. Unfortunately the gradient (10.35) is defined only at the training data points xi ,whereas the ultimate goal is to generalize fM (x) to new data not represented in the training set.A possible resolution to this dilemma is to induce a tree T (x; Θm ) at themth iteration whose predictions tm are as close as possible to the negativegradient.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.