The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 81

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 81 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 812020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 81)

. , p}. Let C be the complementset, with S ∪ C = {1, 2, . . . , p}. A general function f (X) will in principledepend on all of the input variables: f (X) = f (XS , XC ). One way to definethe average or partial dependence of f (X) on XS isfS (XS ) = EXC f (XS , XC ).(10.47)This is a marginal average of f , and can serve as a useful description of theeffect of the chosen subset on f (X) when, for example, the variables in XSdo not have strong interactions with those in XC .Partial dependence functions can be used to interpret the results of any“black box” learning method.

They can be estimated byN1 Xf (XS , xiC ),f¯S (XS ) =N i=1(10.48)where {x1C , x2C , . . . , xN C } are the values of XC occurring in the trainingdata. This requires a pass over the data for each set of joint values of XS forwhich f¯S (XS ) is to be evaluated. This can be computationally intensive,1 latticein R.37010. Boosting and Additive Treeseven for moderately sized data sets. Fortunately with decision trees, f¯S (XS )(10.48) can be rapidly computed from the tree itself without reference tothe data (Exercise 10.11).It is important to note that partial dependence functions defined in(10.47) represent the effect of XS on f (X) after accounting for the (average) effects of the other variables XC on f (X).

They are not the effectof XS on f (X) ignoring the effects of XC . The latter is given by the conditional expectationf˜S (XS ) = E(f (XS , XC )|XS ),(10.49)and is the best least squares approximation to f (X) by a function of XSalone. The quantities f˜S (XS ) and f¯S (XS ) will be the same only in theunlikely event that XS and XC are independent. For example, if the effectof the chosen variable subset happens to be purely additive,f (X) = h1 (XS ) + h2 (XC ).(10.50)Then (10.47) produces the h1 (XS ) up to an additive constant. If the effectis purely multiplicative,f (X) = h1 (XS ) · h2 (XC ),(10.51)then (10.47) produces h1 (XS ) up to a multiplicative constant factor.

Onthe other hand, (10.49) will not produce h1 (XS ) in either case. In fact,(10.49) can produce strong effects on variable subsets for which f (X) hasno dependence at all.Viewing plots of the partial dependence of the boosted-tree approximation (10.28) on selected variables subsets can help to provide a qualitativedescription of its properties. Illustrations are shown in Sections 10.8 and10.14. Owing to the limitations of computer graphics, and human perception, the size of the subsets XS must be small (l ≈ 1, 2, 3). There are ofcourse a large number of such subsets, but only those chosen from amongthe usually much smaller set of highly relevant predictors are likely to beinformative.

Also, those subsets whose effect on f (X) is approximatelyadditive (10.50) or multiplicative (10.51) will be most revealing.For K-class classification, there are K separate models (10.44), one foreach class. Each one is related to the respective probabilities (10.21) throughfk (X) = log pk (X) −K1 Xlog pl (X).K(10.52)l=1Thus each fk (X) is a monotone increasing function of its respective probability on a logarithmic scale. Partial dependence plots of each respectivefk (X) (10.44) on its most relevant predictors (10.45) can help reveal howthe log-odds of realizing that class depend on the respective input variables.10.14 Illustrations37110.14 IllustrationsIn this section we illustrate gradient boosting on a number of larger datasets,using different loss functions as appropriate.10.14.1 California HousingThis data set (Pace and Barry, 1997) is available from the Carnegie-MellonStatLib repository2 . It consists of aggregated data from each of 20,460neighborhoods (1990 census block groups) in California.

The response variable Y is the median house value in each neighborhood measured in units of$100,000. The predictor variables are demographics such as median incomeMedInc, housing density as reflected by the number of houses House, and theaverage occupancy in each house AveOccup. Also included as predictors arethe location of each neighborhood (longitude and latitude), and severalquantities reflecting the properties of the houses in the neighborhood: average number of rooms AveRooms and bedrooms AveBedrms. There are thusa total of eight predictors, all numeric.We fit a gradient boosting model using the MART procedure, with J = 6terminal nodes, a learning rate (10.41) of ν = 0.1, and the Huber losscriterion for predicting the numeric response. We randomly divided thedataset into a training set (80%) and a test set (20%).Figure 10.13 shows the average absolute errorAAE = E |y − fˆM (x)|(10.53)as a function for number of iterations M on both the training data and testdata.

The test error is seen to decrease monotonically with increasing M ,more rapidly during the early stages and then leveling off to being nearlyconstant as iterations increase. Thus, the choice of a particular value of Mis not critical, as long as it is not too small. This tends to be the case inmany applications. The shrinkage strategy (10.41) tends to eliminate theproblem of overfitting, especially for larger data sets.The value of AAE after 800 iterations is 0.31.

This can be compared tothat of the optimal constant predictor median{yi } which is 0.89. In terms ofmore familiar quantities, the squared multiple correlation coefficient of thismodel is R2 = 0.84. Pace and Barry (1997) use a sophisticated spatial autoregression procedure, where prediction for each neighborhood is based onmedian house values in nearby neighborhoods, using the other predictors ascovariates.

Experimenting with transformations they achieved R2 = 0.85,predicting log Y . Using log Y as the response the corresponding value forgradient boosting was R2 = 0.86.2 http://lib.stat.cmu.edu.37210. Boosting and Additive TreesTraining and Test Absolute Error0.40.00.2Absolute Error0.60.8Train ErrorTest Error0200400600800Iterations MFIGURE 10.13. Average-absolute error as a function of number of iterationsfor the California housing data.Figure 10.14 displays the relative variable importances for each of theeight predictor variables. Not surprisingly, median income in the neighborhood is the most relevant predictor.

Longitude, latitude, and averageoccupancy all have roughly half the relevance of income, whereas the othersare somewhat less influential.Figure 10.15 shows single-variable partial dependence plots on the mostrelevant nonlocation predictors. Note that the plots are not strictly smooth.This is a consequence of using tree-based models. Decision trees producediscontinuous piecewise constant models (10.25).

This carries over to sumsof trees (10.28), with of course many more pieces. Unlike most of the methods discussed in this book, there is no smoothness constraint imposed onthe result. Arbitrarily sharp discontinuities can be modeled. The fact thatthese curves generally exhibit a smooth trend is because that is what isestimated to best predict the response for this problem. This is often thecase.The hash marks at the base of each plot delineate the deciles of thedata distribution of the corresponding variables. Note that here the datadensity is lower near the edges, especially for larger values.

This causes thecurves to be somewhat less well determined in those regions. The verticalscales of the plots are the same, and give a visual comparison of the relativeimportance of the different variables.The partial dependence of median house value on median income ismonotonic increasing, being nearly linear over the main body of data. Housevalue is generally monotonic decreasing with increasing average occupancy,except perhaps for average occupancy rates less than one. Median house10.14 Illustrations373PopulationAveBedrmsAveRoomsHouseAgeLatitudeAveOccupLongitudeMedInc020406080100Relative importanceFIGURE 10.14.

Relative importance of the predictors for the California housingdata.value has a nonmonotonic partial dependence on average number of rooms.It has a minimum at approximately three rooms and is increasing both forsmaller and larger values.Median house value is seen to have a very weak partial dependence onhouse age that is inconsistent with its importance ranking (Figure 10.14).This suggests that this weak main effect may be masking stronger interaction effects with other variables.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.