The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 48

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 48 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 482020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 48)

Thenfˆ(x0 )==b(x0 )T (BT W(x0 )B)−1 BT W(x0 )yNXli (x0 )yi .(6.8)(6.9)i=1Equation (6.8) gives an explicit expression for the local linear regressionestimate, and (6.9) highlights the fact that the estimate is linear in the1966. Kernel Smoothing MethodsLocal Linear Equivalent Kernel at BoundaryO1.5OOOOOOO O O O OOOOOO O OOOOOOOOO OOOOOOOOOOOOOO OOOO 0OOOOOOOOO OOOOOOOO OOOOOOOOOOOOOOOOOOOOOO•OO••fˆ(x ••)• ••••••••• ••••••••••••••• •••• • •••••• ••••••••• ••• • • •••••••••••0.0x00.20.40.60.8OOOOOOOOOOOOOOO0OOOOOOOOOO O OOOOOOOOO OOOOOOOOOOOOOO OOOOOOOOOOOOO OOOOOOOO OOOOOOOOOOOOOOOOOOOOOOOOOO••••••••••••••••• •••••••••••••• •••• •••• ˆ •••• f (x )••••••••• ••• • • •••••••••••OO-1.0O-1.0-0.50.00.51.0•OOO1.0OO0.5OO0.0O••Local Linear Equivalent Kernel in InteriorOO-0.51.5•••1.00.00.20.4x00.60.8O1.0FIGURE 6.4.

The green points show the equivalentkernel li (x0 ) for local rePNgression. These are the weights in fˆ(x0 ) =i=1 li (x0 )yi , plotted against theircorresponding xi . For display purposes, these have been rescaled, since in factthey sum to 1. Since the yellow shaded region is the (rescaled) equivalent kernelfor the Nadaraya–Watson local average, we see how local regression automatically modifies the weighting kernel to correct for biases due to asymmetry in thesmoothing window.yi (the li (x0 ) do not involve y). These weights li (x0 ) combine the weighting kernel Kλ (x0 , ·) and the least squares operations, and are sometimesreferred to as the equivalent kernel.

Figure 6.4 illustrates the effect of local linear regression on the equivalent kernel. Historically, the bias in theNadaraya–Watson and other local average kernel methods were correctedby modifying the kernel. These modifications were based on theoreticalasymptotic mean-square-error considerations, and besides being tedious toimplement, are only approximate for finite sample sizes.

Local linear regression automatically modifies the kernel to correct the bias exactly tofirst order, a phenomenon dubbed as automatic kernel carpentry. Considerthe following expansion for Efˆ(x0 ), using the linearity of local regressionand a series expansion of the true function f around x0 ,Efˆ(x0 )=NXli (x0 )f (xi )i=1=f (x0 )NXli (x0 ) + f ′ (x0 )i=1i=1+NXN(xi − x0 )li (x0 )f ′′ (x0 ) X(xi − x0 )2 li (x0 ) + R,2i=1(6.10)where the remainder term R involves third- and higher-order derivatives off , and is typically small under suitable smoothness assumptions.

It can be6.1 One-Dimensional Kernel Smoothers1.5OO OO O OOOO OOOOOOOOO OO OOOOOOOOO OOOOO-0.5OO OO-1.0O0.20.40.60.8OOOOO OOOO•OOO OO OOOOOOOOOOOOOOOOOO OO O OOOO OOOOOOOOO OO OOOOOOOOO OOOOOOO OOOOO0.0OOOOOOOOOOOOOOOOfˆ(x0 )OOOOO1.0•OO0.5OOO OO OO OOOOOOOOOOOOOOOOOO OOOfˆ(x0 )OOOOOOOO-0.51.0OO0.50.0OOOOOOOOOOOOOO0.0OOLocal Quadratic in InteriorOOOO1.0-1.01.5Local Linear in InteriorO197OOOO0.00.20.40.60.8OO1.0FIGURE 6.5.

Local linear fits exhibit bias in regions of curvature of the truefunction. Local quadratic fits tend to eliminate this bias.PNshown (Exercise 6.2) that for local linear regression, i=1 li (x0 ) = 1 andPNi=1 (xi − x0 )li (x0 ) = 0. Hence the middle term equals f (x0 ), and sincethe bias is Efˆ(x0 ) − f (x0 ), we see that it depends only on quadratic andhigher–order terms in the expansion of f .6.1.2 Local Polynomial RegressionWhy stop at local linear fits? We can fit local polynomial fits of any degree d,2dNXX(6.11)minβj (x0 )xji Kλ (x0 , xi ) yi − α(x0 ) −α(x0 ),βj (x0 ), j=1,...,di=1j=1Pdwith solution fˆ(x0 ) = α̂(x0 )+ j=1 β̂j (x0 )xj0 .

In fact, an expansion such as(6.10) will tell us that the bias will only have components of degree d+1 andhigher (Exercise 6.2). Figure 6.5 illustrates local quadratic regression. Locallinear fits tend to be biased in regions of curvature of the true function, aphenomenon referred to as trimming the hills and filling the valleys. Localquadratic regression is generally able to correct this bias.There is of course a price to be paid for this bias reduction, and that isincreased variance. The fit in the right panel of Figure 6.5 is slightly morewiggly, especially in the tails.

Assuming the model yi = f (xi ) + εi , withεi independent and identically distributed with mean zero and varianceσ 2 , Var(fˆ(x0 )) = σ 2 ||l(x0 )||2 , where l(x0 ) is the vector of equivalent kernelweights at x0 . It can be shown (Exercise 6.3) that ||l(x0 )|| increases with d,and so there is a bias–variance tradeoff in selecting the polynomial degree.Figure 6.6 illustrates these variance curves for degree zero, one and two6. Kernel Smoothing Methods0.20.3ConstantLinearQuadratic0.00.1Variance0.40.51980.00.20.40.60.81.0FIGURE 6.6. The variances functions ||l(x)||2 for local constant, linear andquadratic regression, for a metric bandwidth (λ = 0.2) tri-cube kernel.local polynomials.

To summarize some collected wisdom on this issue:• Local linear fits can help bias dramatically at the boundaries at amodest cost in variance. Local quadratic fits do little at the boundaries for bias, but increase the variance a lot.• Local quadratic fits tend to be most helpful in reducing bias due tocurvature in the interior of the domain.• Asymptotic analysis suggest that local polynomials of odd degreedominate those of even degree. This is largely due to the fact thatasymptotically the MSE is dominated by boundary effects.While it may be helpful to tinker, and move from local linear fits at theboundary to local quadratic fits in the interior, we do not recommend suchstrategies.

Usually the application will dictate the degree of the fit. Forexample, if we are interested in extrapolation, then the boundary is ofmore interest, and local linear fits are probably more reliable.6.2 Selecting the Width of the KernelIn each of the kernels Kλ , λ is a parameter that controls its width:• For the Epanechnikov or tri-cube kernel with metric width, λ is theradius of the support region.• For the Gaussian kernel, λ is the standard deviation.• λ is the number k of nearest neighbors in k-nearest neighborhoods,often expressed as a fraction or span k/N of the total training sample.6.2 Selecting the Width of the Kernel199••••••••••••••••••••••••••••••••• •••••••••••••••• •••••• •••••••••••• ••••••• ••••••••• •••••• •• •••••••••••••••••••••• ••••••••••••••• ••••••••••••••••••••••••• •••••••••••••••• •••••••••••• ••••••••••••••••••••• ••••••••••••••••••• •••FIGURE 6.7.

Equivalent kernels for a local linear regression smoother (tri-cubekernel; orange) and a smoothing spline (blue), with matching degrees of freedom.The vertical spikes indicates the target points.There is a natural bias–variance tradeoff as we change the width of theaveraging window, which is most explicit for local averages:• If the window is narrow, fˆ(x0 ) is an average of a small number of yiclose to x0 , and its variance will be relatively large—close to that ofan individual yi . The bias will tend to be small, again because eachof the E(yi ) = f (xi ) should be close to f (x0 ).• If the window is wide, the variance of fˆ(x0 ) will be small relative tothe variance of any yi , because of the effects of averaging.

The biaswill be higher, because we are now using observations xi further fromx0 , and there is no guarantee that f (xi ) will be close to f (x0 ).Similar arguments apply to local regression estimates, say local linear: asthe width goes to zero, the estimates approach a piecewise-linear functionthat interpolates the training data1 ; as the width gets infinitely large, thefit approaches the global linear least-squares fit to the data.The discussion in Chapter 5 on selecting the regularization parameter forsmoothing splines applies here, and will not be repeated.

Local regressionsmoothers are linear estimators; the smoother matrix in f̂ = Sλ y is built upfrom the equivalent kernels (6.8), and has ijth entry {Sλ }ij = li (xj ). Leaveone-out cross-validation is particularly simple (Exercise 6.7), as is generalized cross-validation, Cp (Exercise 6.10), and k-fold cross-validation. Theeffective degrees of freedom is again defined as trace(Sλ ), and can be usedto calibrate the amount of smoothing.

Figure 6.7 compares the equivalentkernels for a smoothing spline and local linear regression. The local regression smoother has a span of 40%, which results in df = trace(Sλ ) = 5.86.The smoothing spline was calibrated to have the same df, and their equivalent kernels are qualitatively quite similar.1 Withuniformly spaced xi ; with irregularly spaced xi , the behavior can deteriorate.2006. Kernel Smoothing Methods6.3 Local Regression in IRpKernel smoothing and local regression generalize very naturally to two ormore dimensions. The Nadaraya–Watson kernel smoother fits a constantlocally with weights supplied by a p-dimensional kernel. Local linear regression will fit a hyperplane locally in X, by weighted least squares, withweights supplied by a p-dimensional kernel.

It is simple to implement andis generally preferred to the local constant fit for its superior performanceon the boundaries.Let b(X) be a vector of polynomial terms in X of maximum degree d.For example, with d = 1 and p = 2 we get b(X) = (1, X1 , X2 ); with d = 2we get b(X) = (1, X1 , X2 , X12 , X22 , X1 X2 ); and trivially with d = 0 we getb(X) = 1.

At each x0 ∈ IRp solveminβ(x0 )NXi=1Kλ (x0 , xi )(yi − b(xi )T β(x0 ))2(6.12)to produce the fit fˆ(x0 ) = b(x0 )T β̂(x0 ). Typically the kernel will be a radialfunction, such as the radial Epanechnikov or tri-cube kernel||x − x0 ||,(6.13)Kλ (x0 , x) = Dλwhere ||·|| is the Euclidean norm. Since the Euclidean norm depends on theunits in each coordinate, it makes most sense to standardize each predictor,for example, to unit standard deviation, prior to smoothing.While boundary effects are a problem in one-dimensional smoothing,they are a much bigger problem in two or higher dimensions, since thefraction of points on the boundary is larger.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.