The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 50

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 50 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 502020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 50)

Denoting the lag set byzt = (yt−1 , yt−2 , . . . , yt−k ), the model looks like a standard linearmodel yt = ztT β + εt , and is typically fit by least squares. Fittingby local least squares with a kernel K(z0 , zt ) allows the model tovary according to the short-term history of the series. This is to bedistinguished from the more traditional dynamic linear models thatvary by windowing time.As an illustration of local likelihood, we consider the local version of themulticlass linear logistic regression model (4.36) of Chapter 4. The dataconsist of features xi and an associated categorical response gi ∈ {1, 2, .

. . , J},and the linear model has the formTPr(G = j|X = x) =eβj0 +βj xPJ−1 β +β T x .1 + k=1 e k0 kThe local log-likelihood for this J class model can be written(NXKλ (x0 , xi ) βgi 0 (x0 ) + βgi (x0 )T (xi − x0 )i=1"− log 1 +J−1Xk=1Texp βk0 (x0 ) + βk (x0 ) (xi − x0 )(6.18)#).(6.19)Notice that• we have used gi as a subscript in the first line to pick out the appropriate numerator;• βJ0 = 0 and βJ = 0 by the definition of the model;• we have centered the local regressions at x0 , so that the fitted posterior probabilities at x0 are simplyP̂r(G = j|X = x0 ) =eβ̂j0 (x0 )PJ−1 β̂ (x ) .1 + k=1 e k0 0(6.20)0.40.60.81.02070.00.2Prevalence CHD0.80.60.40.20.0Prevalence CHD1.06.5 Local Likelihood and Other Models100140180Systolic Blood Pressure22015253545ObesityFIGURE 6.12.

Each plot shows the binary response CHD (coronary heart disease) as a function of a risk factor for the South African heart disease data.For each plot we have computed the fitted prevalence of CHD using a local linearlogistic regression model. The unexpected increase in the prevalence of CHD atthe lower ends of the ranges is because these are retrospective data, and some ofthe subjects had already undergone treatment to reduce their blood pressure andweight. The shaded region in the plot indicates an estimated pointwise standarderror band.This model can be used for flexible multiclass classification in moderatelylow dimensions, although successes have been reported with the highdimensional ZIP-code classification problem.

Generalized additive models(Chapter 9) using kernel smoothing methods are closely related, and avoiddimensionality problems by assuming an additive structure for the regression function.As a simple illustration we fit a two-class local linear logistic model tothe heart disease data of Chapter 4. Figure 6.12 shows the univariate locallogistic models fit to two of the risk factors (separately). This is a usefulscreening device for detecting nonlinearities, when the data themselves havelittle visual information to offer.

In this case an unexpected anomaly isuncovered in the data, which may have gone unnoticed with traditionalmethods.Since CHD is a binary indicator, we could estimate the conditional prevalence Pr(G = j|x0 ) by simply smoothing this binary response directly without resorting to a likelihood formulation. This amounts to fitting a locallyconstant logistic regression model (Exercise 6.5).

In order to enjoy the biascorrection of local-linear smoothing, it is more natural to operate on theunrestricted logit scale.Typically with logistic regression, we compute parameter estimates aswell as their standard errors.

This can be done locally as well, and so6. Kernel Smoothing Methods0.0150.0100.00.005Density Estimate0.020208100120140160180200220Systolic Blood Pressure (for CHD group)FIGURE 6.13. A kernel density estimate for systolic blood pressure (for theCHD group). The density estimate at each point is the average contribution fromeach of the kernels at that point. We have scaled the kernels down by a factor of10 to make the graph readable.we can produce, as shown in the plot, estimated pointwise standard-errorbands about our fitted prevalence.6.6 Kernel Density Estimation and ClassificationKernel density estimation is an unsupervised learning procedure, whichhistorically precedes kernel regression.

It also leads naturally to a simplefamily of procedures for nonparametric classification.6.6.1 Kernel Density EstimationSuppose we have a random sample x1 , . . . , xN drawn from a probabilitydensity fX (x), and we wish to estimate fX at a point x0 . For simplicity weassume for now that X ∈ IR. Arguing as before, a natural local estimatehas the form#xi ∈ N (x0 ),(6.21)fˆX (x0 ) =Nλwhere N (x0 ) is a small metric neighborhood around x0 of width λ. Thisestimate is bumpy, and the smooth Parzen estimate is preferredN1 XfˆX (x0 ) =Kλ (x0 , xi ),N λ i=1(6.22)2090.80.60.40.00.2Posterior Estimate0.0100.020CHDno CHD0.0Density Estimates1.06.6 Kernel Density Estimation and Classification100140180220100Systolic Blood Pressure140180220Systolic Blood PressureFIGURE 6.14.

The left panel shows the two separate density estimates forsystolic blood pressure in the CHD versus no-CHD groups, using a Gaussiankernel density estimate in each. The right panel shows the estimated posteriorprobabilities for CHD, using (6.25).because it counts observations close to x0 with weights that decrease withdistance from x0 . In this case a popular choice for Kλ is the Gaussian kernelKλ (x0 , x) = φ(|x − x0 |/λ).

Figure 6.13 shows a Gaussian kernel density fitto the sample values for systolic blood pressure for the CHD group. Lettingφλ denote the Gaussian density with mean zero and standard-deviation λ,then (6.22) has the formfˆX (x)==N1 Xφλ (x − xi )N i=1(F̂ ⋆ φλ )(x),(6.23)the convolution of the sample empirical distribution F̂ with φλ . The distribution F̂ (x) puts mass 1/N at each of the observed xi , and is jumpy; infˆX (x) we have smoothed F̂ by adding independent Gaussian noise to eachobservation xi .The Parzen density estimate is the equivalent of the local average, andimprovements have been proposed along the lines of local regression [on thelog scale for densities; see Loader (1999)].

We will not pursue these here.In IRp the natural generalization of the Gaussian density estimate amountsto using the Gaussian product kernel in (6.23),fˆX (x0 ) =NX211e− 2 (||xi −x0 ||/λ) .pN (2λ2 π) 2 i=1(6.24)2106. Kernel Smoothing Methods1.00.50.0FIGURE 6.15. The population class densities may have interesting structure(left) that disappears when the posterior probabilities are formed (right).6.6.2 Kernel Density ClassificationOne can use nonparametric density estimates for classification in a straightforward fashion using Bayes’ theorem.

Suppose for a J class problem we fitnonparametric density estimates fˆj (X), j = 1, . . . , J separately in each ofthe classes, and we also have estimates of the class priors π̂j (usually thesample proportions). Thenπ̂j fˆj (x0 )P̂r(G = j|X = x0 ) = PJ.ˆk=1 π̂k fk (x0 )(6.25)Figure 6.14 uses this method to estimate the prevalence of CHD for theheart risk factor study, and should be compared with the left panel of Figure 6.12. The main difference occurs in the region of high SBP in the rightpanel of Figure 6.14. In this region the data are sparse for both classes, andsince the Gaussian kernel density estimates use metric kernels, the densityestimates are low and of poor quality (high variance) in these regions.

Thelocal logistic regression method (6.20) uses the tri-cube kernel with k-NNbandwidth; this effectively widens the kernel in this region, and makes useof the local linear assumption to smooth out the estimate (on the logitscale).If classification is the ultimate goal, then learning the separate class densities well may be unnecessary, and can in fact be misleading. Figure 6.15shows an example where the densities are both multimodal, but the posterior ratio is quite smooth.

In learning the separate densities from data,one might decide to settle for a rougher, high-variance fit to capture thesefeatures, which are irrelevant for the purposes of estimating the posteriorprobabilities. In fact, if classification is the ultimate goal, then we need onlyto estimate the posterior well near the decision boundary (for two classes,this is the set {x|Pr(G = 1|X = x) = 12 }).6.6.3 The Naive Bayes ClassifierThis is a technique that has remained popular over the years, despite itsname (also known as “Idiot’s Bayes”!) It is especially appropriate when6.6 Kernel Density Estimation and Classification211the dimension p of the feature space is high, making density estimationunattractive. The naive Bayes model assumes that given a class G = j, thefeatures Xk are independent:fj (X) =pYfjk (Xk ).(6.26)k=1While this assumption is generally not true, it does simplify the estimationdramatically:• The individual class-conditional marginal densities fjk can each beestimated separately using one-dimensional kernel density estimates.This is in fact a generalization of the original naive Bayes procedures,which used univariate Gaussians to represent these marginals.• If a component Xj of X is discrete, then an appropriate histogramestimate can be used.

This provides a seamless way of mixing variabletypes in a feature vector.Despite these rather optimistic assumptions, naive Bayes classifiers oftenoutperform far more sophisticated alternatives. The reasons are related toFigure 6.15: although the individual class density estimates may be biased,this bias might not hurt the posterior probabilities as much, especiallynear the decision regions. In fact, the problem may be able to withstandconsiderable bias for the savings in variance such a “naive” assumptionearns.Starting from (6.26) we can derive the logit-transform (using class J asthe base):logπℓ fℓ (X)Pr(G = ℓ|X)= logPr(G = J|X)πJ fJ (X)Qpπℓ k=1 fℓk (Xk )Qp= logπJ k=1 fJk (Xk )pfℓk (Xk )πℓ X+log= logπJfJk (Xk )(6.27)k=1= αℓ +pXgℓk (Xk ).k=1This has the form of a generalized additive model, which is described in moredetail in Chapter 9. The models are fit in quite different ways though; theirdifferences are explored in Exercise 6.9.

The relationship between naiveBayes and generalized additive models is analogous to that between lineardiscriminant analysis and logistic regression (Section 4.4.5).2126. Kernel Smoothing Methods6.7 Radial Basis Functions and KernelsIn ChapterPM 5, functions are represented as expansions in basis functions:f (x) = j=1 βj hj (x). The art of flexible modeling using basis expansionsconsists of picking an appropriate family of basis functions, and then controlling the complexity of the representation by selection, regularization, orboth. Some of the families of basis functions have elements that are definedlocally; for example, B-splines are defined locally in IR.

If more flexibilityis desired in a particular region, then that region needs to be representedby more basis functions (which in the case of B-splines translates to moreknots). Tensor products of IR-local basis functions deliver basis functionslocal in IRp . Not all basis functions are local—for example, the truncatedpower bases for splines, or the sigmoidal basis functions σ(α0 + αx) usedin neural-networks (see Chapter 11). The composed function f (x) can nevertheless show local behavior, because of the particular signs and valuesof the coefficients causing cancellations of global effects.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.