The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 64

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 64 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 642020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 64)

In this example the trees have high variance due to the correlation in the predictors.Bagging succeeds in smoothing out this variance and hence reducing thetest error.Bagging can dramatically reduce the variance of unstable procedureslike trees, leading to improved prediction. A simple argument shows why2848.

Model Inference and AveragingOriginal Treeb=1b=2x.1 < 0.395x.1 < 0.555x.2 < 0.205|||110101001000011b=3001b=41b=5x.2 < 0.285x.3 < 0.985x.4 < −1.36|||01001101110101b=611001b=7b=8x.1 < 0.395x.1 < 0.395x.3 < 0.985|||1110110001001b=9001b = 10b = 11x.1 < 0.395x.1 < 0.555x.1 < 0.555|||101011010010101FIGURE 8.9. Bagging trees on simulated dataset. The top left panel shows theoriginal tree. Eleven trees grown on bootstrap samples are shown. For each tree,the top split is annotated.0.508.7 BaggingConsensusProbability0.400.45Original Tree0.35Bagged Trees0.250.30Test Error2850.20Bayes050100150200Number of Bootstrap SamplesFIGURE 8.10. Error curves for the bagging example of Figure 8.9.

Shown isthe test error of the original tree and bagged trees as a function of the number ofbootstrap samples. The orange points correspond to the consensus vote, while thegreen points average the probabilities.bagging helps under squared-error loss, in short because averaging reducesvariance and leaves bias unchanged.Assume our training observations (xi , yi ), i = 1, . . . , N are independently drawn from a distribution P, and consider the ideal aggregate estimator fag (x) = EP fˆ∗ (x).

Here x is fixed and the bootstrap dataset Z∗consists of observations x∗i , yi∗ , i = 1, 2, . . . , N sampled from P. Note thatfag (x) is a bagging estimate, drawing bootstrap samples from the actualpopulation P rather than the data. It is not an estimate that we can usein practice, but is convenient for analysis. We can writeEP [Y − fˆ∗ (x)]2==≥EP [Y − fag (x) + fag (x) − fˆ∗ (x)]2EP [Y − fag (x)]2 + EP [fˆ∗ (x) − fag (x)]2EP [Y − fag (x)]2 .(8.52)The extra error on the right-hand side comes from the variance of fˆ∗ (x)around its mean fag (x).

Therefore true population aggregation never increases mean squared error. This suggests that bagging—drawing samplesfrom the training data— will often decrease mean-squared error.The above argument does not hold for classification under 0-1 loss, because of the nonadditivity of bias and variance. In that setting, bagging a2868. Model Inference and Averaginggood classifier can make it better, but bagging a bad classifier can make itworse.

Here is a simple example, using a randomized rule. Suppose Y = 1for all x, and the classifier Ĝ(x) predicts Y = 1 (for all x) with probability 0.4 and predicts Y = 0 (for all x) with probability 0.6. Then themisclassification error of Ĝ(x) is 0.6 but that of the bagged classifier is 1.0.For classification we can understand the bagging effect in terms of aconsensus of independent weak learners (Dietterich, 2000a). Let the Bayesoptimal decision at x be G(x) = 1 in a two-class example.

Suppose eachof the weak learners G∗b have an error-rate eb = e < 0.5, and let S1 (x) =PB∗b=1 I(Gb (x) = 1) be the consensus vote for class 1. Since the weak learners are assumed to be independent, S1 (x) ∼ Bin(B, 1 − e), and Pr(S1 >B/2) → 1 as B gets large. This concept has been popularized outside ofstatistics as the “Wisdom of Crowds” (Surowiecki, 2004) — the collectiveknowledge of a diverse and independent body of people typically exceedsthe knowledge of any single individual, and can be harnessed by voting.Of course, the main caveat here is “independent,” and bagged trees arenot.

Figure 8.11 illustrates the power of a consensus vote in a simulatedexample, where only 30% of the voters have some knowledge.In Chapter 15 we see how random forests improve on bagging by reducingthe correlation between the sampled trees.Note that when we bag a model, any simple structure in the model islost. As an example, a bagged tree is no longer a tree. For interpretationof the model this is clearly a drawback.

More stable procedures like nearest neighbors are typically not affected much by bagging. Unfortunately,the unstable models most helped by bagging are unstable because of theemphasis on interpretability, and this is lost in the bagging process.Figure 8.12 shows an example where bagging doesn’t help. The 100 datapoints shown have two features and two classes, separated by the graylinear boundary x1 + x2 = 1. We choose as our classifier Ĝ(x) a singleaxis-oriented split, choosing the split along either x1 or x2 that producesthe largest decrease in training misclassification error.The decision boundary obtained from bagging the 0-1 decision rule overB = 50 bootstrap samples is shown by the blue curve in the left panel.It does a poor job of capturing the true boundary.

The single split rule,derived from the training data, splits near 0 (the middle of the range of x1or x2 ), and hence has little contribution away from the center. Averagingthe probabilities rather than the classifications does not help here. Baggingestimates the expected class probabilities from the single split rule, that is,averaged over many replications. Note that the expected class probabilitiescomputed by bagging cannot be realized on any single replication, in thesame way that a woman cannot have 2.4 children. In this sense, baggingincreases somewhat the space of models of the individual base classifier.However, it doesn’t help in this and many other examples where a greaterenlargement of the model class is needed.

“Boosting” is a way of doing this8.7 Bagging287ConsensusIndividual6402Expected Correct out of 10810Wisdom of Crowds0.250.500.751.00P − Probability of Informed Person Being CorrectFIGURE 8.11. Simulated academy awards voting. 50 members vote in 10 categories, each with 4 nominations. For any category, only 15 voters have someknowledge, represented by their probability of selecting the “correct” candidate inthat category (so P = 0.25 means they have no knowledge). For each category, the15 experts are chosen at random from the 50.

Results show the expected correct(based on 50 simulations) for the consensus, as well as for the individuals. Theerror bars indicate one standard deviation. We see, for example, that if the 15informed for a category have a 50% chance of selecting the correct candidate, theconsensus doubles the expected performance of an individual.2888. Model Inference and AveragingBagged Decision RuleBoosted Decision Rule• •• •••••••• •••• • ••••••••• • • • •••••••• • •••••••• • ••• • •••• ••• • •• •••••• • ••• •••••••••••••••••• ••••••• ••• •••• • • • • • •• •• •• ••••• ••••• •••• • •• ••• •••••• •••••••••• • •• •••• •• • • • •• •••• ••• ••• •••• •••••• • ••• ••••••• •• •• •••••••• •••• • ••••••••• • • • •••••••• • •••••••• • ••• • •••• ••• • •• •••••• • ••• ••••••••••••••••••••••••• ••• •••• • • • • • •• •• •• ••••• ••••• •••• • •• ••• •••••• •••••••••• • •• •••• •• • • • •• •••• ••• ••• •••• •••••• • ••• ••••••• •FIGURE 8.12.

Data with two features and two classes, separated by a linearboundary. (Left panel:) Decision boundary estimated from bagging the decisionrule from a single split, axis-oriented classifier. (Right panel:) Decision boundaryfrom boosting the decision rule of the same classifier. The test error rates are0.166, and 0.065, respectively. Boosting is described in Chapter 10.and is described in Chapter 10. The decision boundary in the right panel isthe result of the boosting procedure, and it roughly captures the diagonalboundary.8.8 Model Averaging and StackingIn Section 8.4 we viewed bootstrap values of an estimator as approximateposterior values of a corresponding parameter, from a kind of nonparametric Bayesian analysis.

Viewed in this way, the bagged estimate (8.51) isan approximate posterior Bayesian mean. In contrast, the training sampleestimate fˆ(x) corresponds to the mode of the posterior. Since the posteriormean (not mode) minimizes squared-error loss, it is not surprising thatbagging can often reduce mean squared-error.Here we discuss Bayesian model averaging more generally. We have aset of candidate models Mm , m = 1, .

. . , M for our training set Z. Thesemodels may be of the same type with different parameter values (e.g.,subsets in linear regression), or different models for the same task (e.g.,neural networks and regression trees).Suppose ζ is some quantity of interest, for example, a prediction f (x) atsome fixed feature value x. The posterior distribution of ζ isPr(ζ|Z) =MXm=1Pr(ζ|Mm , Z)Pr(Mm |Z),(8.53)8.8 Model Averaging and Stacking289with posterior meanE(ζ|Z) =MXm=1E(ζ|Mm , Z)Pr(Mm |Z).(8.54)This Bayesian prediction is a weighted average of the individual predictions,with weights proportional to the posterior probability of each model.This formulation leads to a number of different model-averaging strategies.

Committee methods take a simple unweighted average of the predictions from each model, essentially giving equal probability to each model.More ambitiously, the development in Section 7.7 shows the BIC criterioncan be used to estimate posterior model probabilities. This is applicablein cases where the different models arise from the same parametric model,with different parameter values. The BIC gives weight to each model depending on how well it fits and how many parameters it uses. One can alsocarry out the Bayesian recipe in full. If each model Mm has parametersθm , we writePr(Mm |Z)∝ Pr(Mm ) · ZPr(Z|Mm )∝ Pr(Mm ) ·Pr(Z|θm , Mm )Pr(θm |Mm )dθm .(8.55)In principle one can specify priors Pr(θm |Mm ) and numerically compute the posterior probabilities from (8.55), to be used as model-averagingweights.

However, we have seen no real evidence that this is worth all ofthe effort, relative to the much simpler BIC approximation.How can we approach model averaging from a frequentist viewpoint?Given predictions fˆ1 (x), fˆ2 (x), . . . , fˆM (x), under squared-error loss, we canseek the weights w = (w1 , w2 , . . . , wM ) such thatMi2hXŵ = argmin EP Y −wm fˆm (x) .w(8.56)m=1Here the input value x is fixed and the N observations in the dataset Z (andthe target Y ) are distributed according to P. The solution is the populationlinear regression of Y on F̂ (x)T ≡ [fˆ1 (x), fˆ2 (x), . . . , fˆM (x)]:ŵ = EP [F̂ (x)F̂ (x)T ]−1 EP [F̂ (x)Y ].Now the full regression has smaller error than any single model#2"Mhi2Xˆŵm fm (x) ≤ EP Y − fˆm (x) ∀mEP Y −(8.57)(8.58)m=1so combining models never makes things worse, at the population level.2908.

Model Inference and AveragingOf course the population linear regression (8.57) is not available, and itis natural to replace it with the linear regression over the training set. Butthere are simple examples where this does not work well. For example, iffˆm (x), m = 1, 2, . . . , M represent the prediction from the best subset ofinputs of size m among M total inputs, then linear regression would put allof the weight on the largest model, that is, ŵM = 1, ŵm = 0, m < M .

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.