The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 71

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 71 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 712020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 71)

Illustration of PRIM algorithm. There are two classes, indicatedby the blue (class 0) and red (class 1) points. The procedure starts with a rectangle(broken black lines) surrounding all of the data, and then peels away points alongone edge by a prespecified amount in order to maximize the mean of the pointsremaining in the box.

Starting at the top left panel, the sequence of peelings isshown, until a pure red region is isolated in the bottom right panel. The iterationnumber is indicated at the top of each panel.•••••••••••••••••• • • ••50••100•••150Number of Observations in BoxFIGURE 9.8. Box mean as a function of number of observations in the box.3209. Additive Models, Trees, and Related MethodsAlgorithm 9.3 Patient Rule Induction Method.1.

Start with all of the training data, and a maximal box containing allof the data.2. Consider shrinking the box by compressing one face, so as to peel offthe proportion α of observations having either the highest values ofa predictor Xj , or the lowest. Choose the peeling that produces thehighest response mean in the remaining box. (Typically α = 0.05 or0.10.)3. Repeat step 2 until some minimal number of observations (say 10)remain in the box.4.

Expand the box along any face, as long as the resulting box meanincreases.5. Steps 1–4 give a sequence of boxes, with different numbers of observations in each box. Use cross-validation to choose a member of thesequence. Call the box B1 .6. Remove the data in box B1 from the dataset and repeat steps 2–5 toobtain a second box, and continue to get as many boxes as desired.variable); a two-class outcome can be handled simply by coding it as 0 and1. There is no simple way to deal with k > 2 classes simultaneously: oneapproach is to run PRIM separately for each class versus a baseline class.An advantage of PRIM over CART is its patience. Because of its binary splits, CART fragments the data quite quickly.

Assuming splits ofequal size, with N observations it can only make log2 (N ) − 1 splits beforerunning out of data. If PRIM peels off a proportion α of training pointsat each stage, it can perform approximately − log(N )/ log(1 − α) peelingsteps before running out of data. For example, if N = 128 and α = 0.10,then log2 (N ) − 1 = 6 while − log(N )/ log(1 − α) ≈ 46. Taking into accountthat there must be an integer number of observations at each stage, PRIMin fact can peel only 29 times. In any case, the ability of PRIM to be morepatient should help the top-down greedy algorithm find a better solution.9.3.1 Spam Example (Continued)We applied PRIM to the spam data, with the response coded as 1 for spamand 0 for email.The first two boxes found by PRIM are summarized below:9.4 MARS: Multivariate Adaptive Regression SplinesRule 1TrainingTestRule 2TrainingTest321Global Mean Box Mean Box Support0.39310.96070.14130.39581.00000.1536ch! > 0.029CAPAVE> 2.331your>0.7051999 < 0.040Rule 1CAPTOT > 79.50edu < 0.070re < 0.535ch; < 0.030Remain Mean0.29980.2862Rule 2Box Mean0.95600.9264removegeorge><Box Support0.10430.10610.0100.110The box support is the proportion of observations falling in the box.The first box is purely spam, and contains about 15% of the test data.The second box contains 10.6% of the test observations, 92.6% of whichare spam.

Together the two boxes contain 26% of the data and are about97% spam. The next few boxes (not shown) are quite small, containing onlyabout 3% of the data.The predictors are listed in order of importance. Interestingly the topsplitting variables in the CART tree (Figure 9.5) do not appear in PRIM’sfirst box.9.4 MARS: Multivariate Adaptive RegressionSplinesMARS is an adaptive procedure for regression, and is well suited for highdimensional problems (i.e., a large number of inputs). It can be viewed as ageneralization of stepwise linear regression or a modification of the CARTmethod to improve the latter’s performance in the regression setting.

Weintroduce MARS from the first point of view, and later make the connectionto CART.MARS uses expansions in piecewise linear basis functions of the form(x − t)+ and (t − x)+ . The “+” means positive part, sox − t, if x > t,t − x, if x < t,(x−t)+ =and (t−x)+ =0,otherwise,0,otherwise.0.0 0.1 0.2 0.3 0.4 0.59. Additive Models, Trees, and Related MethodsBasis Function322(t − x)+0.00.20.4(x − t)+tx0.60.81.0FIGURE 9.9. The basis functions (x − t)+ (solid orange) and (t − x)+ (brokenblue) used by MARS.As an example, the functions (x − 0.5)+ and (0.5 − x)+ are shown in Figure 9.9.Each function is piecewise linear, with a knot at the value t. In theterminology of Chapter 5, these are linear splines. We call the two functionsa reflected pair in the discussion below. The idea is to form reflected pairsfor each input Xj with knots at each observed value xij of that input.Therefore, the collection of basis functions isC = {(Xj − t)+ , (t − Xj )+ }t ∈ {x1j , x2j , .

. . , xN j }j = 1, 2, . . . , p.(9.18)If all of the input values are distinct, there are 2N p basis functions altogether. Note that although each basis function depends only on a singleXj , for example, h(X) = (Xj − t)+ , it is considered as a function over theentire input space IRp .The model-building strategy is like a forward stepwise linear regression,but instead of using the original inputs, we are allowed to use functionsfrom the set C and their products. Thus the model has the formf (X) = β0 +MXβm hm (X),(9.19)m=1where each hm (X) is a function in C, or a product of two or more suchfunctions.Given a choice for the hm , the coefficients βm are estimated by minimizing the residual sum-of-squares, that is, by standard linear regression.

Thereal art, however, is in the construction of the functions hm (x). We startwith only the constant function h0 (X) = 1 in our model, and all functionsin the set C are candidate functions. This is depicted in Figure 9.10.At each stage we consider as a new basis function pair all products of afunction hm in the model set M with one of the reflected pairs in C. Weadd to the model M the term of the formβ̂M +1 hℓ (X) · (Xj − t)+ + β̂M +2 hℓ (X) · (t − Xj )+ , hℓ ∈ M,9.4 MARS: Multivariate Adaptive Regression Splines323X1ConstantX2XpX1X2X2XpX1X2X1X2XpFIGURE 9.10.

Schematic of the MARS forward model-building procedure. Onthe left are the basis functions currently in the model: initially, this is the constantfunction h(X) = 1. On the right are all candidate basis functions to be consideredin building the model. These are pairs of piecewise linear basis functions as inFigure 9.9, with knots t at all unique observed values xij of each predictor Xj .At each stage we consider all products of a candidate pair with a basis functionin the model. The product that decreases the residual error the most is added intothe current model. Above we illustrate the first three steps of the procedure, withthe selected functions shown in red.3249.

Additive Models, Trees, and Related Methodsh(X1 , X2 )X2X1FIGURE 9.11. The function h(X1 , X2 ) = (X1 − x51 )+ · (x72 − X2 )+ , resultingfrom multiplication of two piecewise linear MARS basis functions.that produces the largest decrease in training error. Here β̂M +1 and β̂M +2are coefficients estimated by least squares, along with all the other M + 1coefficients in the model.

Then the winning products are added to themodel and the process is continued until the model set M contains somepreset maximum number of terms.For example, at the first stage we consider adding to the model a functionof the form β1 (Xj − t)+ + β2 (t − Xj )+ ; t ∈ {xij }, since multiplication bythe constant function just produces the function itself. Suppose the bestchoice is β̂1 (X2 − x72 )+ + β̂2 (x72 − X2 )+ . Then this pair of basis functionsis added to the set M, and at the next stage we consider including a pairof products the formhm (X) · (Xj − t)+hm (X) · (t − Xj )+ , t ∈ {xij },andwhere for hm we have the choicesh0 (X)= 1,h1 (X)h2 (X)= (X2 − x72 )+ , or= (x72 − X2 )+ .The third choice produces functions such as (X1 − x51 )+ · (x72 − X2 )+ ,depicted in Figure 9.11.At the end of this process we have a large model of the form (9.19).

Thismodel typically overfits the data, and so a backward deletion procedureis applied. The term whose removal causes the smallest increase in residual squared error is deleted from the model at each stage, producing anestimated best model fˆλ of each size (number of terms) λ. One could usecross-validation to estimate the optimal value of λ, but for computational9.4 MARS: Multivariate Adaptive Regression Splines325savings the MARS procedure instead uses generalized cross-validation.

Thiscriterion is defined asGCV(λ) =PN− fˆλ (xi ))2.(1 − M (λ)/N )2i=1 (yi(9.20)The value M (λ) is the effective number of parameters in the model: thisaccounts both for the number of terms in the models, plus the numberof parameters used in selecting the optimal positions of the knots. Somemathematical and simulation results suggest that one should pay a priceof three parameters for selecting a knot in a piecewise linear regression.Thus if there are r linearly independent basis functions in the model, andK knots were selected in the forward process, the formula is M (λ) = r+cK,where c = 3.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.