The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 39

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 39 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 392020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 39)

The smoother matrix for a smoothing spline is nearly banded,indicating an equivalent kernel with local support. The left panel represents theelements of S as an image. The right panel shows the equivalent kernel or weighting function in detail for the indicated rows.1585. Basis Expansions and Regularizationsplines, we have only the penalty parameter λ to select, since the knots areat all the unique training X’s, and cubic degree is almost always used inpractice.Selecting the placement and number of knots for regression splines can bea combinatorially complex task, unless some simplifications are enforced.The MARS procedure in Chapter 9 uses a greedy algorithm with someadditional approximations to achieve a practical compromise.

We will notdiscuss this further here.5.5.1 Fixing the Degrees of FreedomSince dfλ = trace(Sλ ) is monotone in λ for smoothing splines, we can invert the relationship and specify λ by fixing df. In practice this can beachieved by simple numerical methods. So, for example, in R one can usesmooth.spline(x,y,df=6) to specify the amount of smoothing. This encourages a more traditional mode of model selection, where we might try a couple of different values of df, and select one based on approximate F -tests,residual plots and other more subjective criteria. Using df in this way provides a uniform approach to compare many different smoothing methods.It is particularly useful in generalized additive models (Chapter 9), whereseveral smoothing methods can be simultaneously used in one model.5.5.2 The Bias–Variance TradeoffFigure 5.9 shows the effect of the choice of dfλ when using a smoothingspline on a simple example:Y = f (X) + ε,f (X) =sin(12(X + 0.2)),X + 0.2(5.22)with X ∼ U [0, 1] and ε ∼ N (0, 1).

Our training sample consists of N = 100pairs xi , yi drawn independently from this model.The fitted splines for three different values of dfλ are shown. The yellowshaded region in the figure represents the pointwise standard error of fˆλ ,that is, we have shaded the region between fˆλ (x) ± 2 · se(fˆλ (x)). Sincef̂ = Sλ y,Cov(f̂ )==Sλ Cov(y)STλSλ STλ .(5.23)The diagonal contains the pointwise variances at the training xi . The biasis given byBias(f̂ )==f − E(f̂ )f − Sλ f ,(5.24)5.5 Automatic Selection of the Smoothing Parametersdfλ = 5Cross-Validation21.51.4y01.3−21.21.1EPE(λ) and CV(λ)OOOOOOOOOOOOOOOOOO OO OOOOOO OOO OOOO OOOOOOOOOOOOOOOO OOOO OO OOOOOOOOOOO OO OOO OOO OOOOO OOOOOOOOO OOOOOOO−41.081012OOCVEPE614OO0.00.20.4dfλ = 9O20−2OOOOOOOOOOOOOOOOOOO OOO OOOO OOO OOOO OOOOOOOOOOOOOOOO OOOO OO OOOOOOOOOOO OO OOO OOO OOOOO OOOOOOOOO OOOOOOO−4OO0.40.6X1.00.81.0OOy20−4−2yOOOOOOOOOOOOOOOOOOO OOO OOOO OOO OOOO OOOOOOOOOOOOOOOO OOOO OO OOOOOOOOOOO OO OOO OOO OOOOO OOOOOOOOO OOO0.20.8dfλ = 15OOOO0.00.6XdfλO159OO0.00.20.40.60.81.0XFIGURE 5.9.

The top left panel shows the EPE(λ) and CV(λ) curves for arealization from a nonlinear additive error model (5.22). The remaining panelsshow the data, the true functions (in purple), and the fitted curves (in green) withyellow shaded ±2× standard error bands, for three different values of dfλ .1605. Basis Expansions and Regularizationwhere f is the (unknown) vector of evaluations of the true f at the trainingX’s. The expectations and variances are with respect to repeated drawsof samples of size N = 100 from the model (5.22). In a similar fashionVar(fˆλ (x0 )) and Bias(fˆλ (x0 )) can be computed at any point x0 (Exercise 5.10).

The three fits displayed in the figure give a visual demonstrationof the bias-variance tradeoff associated with selecting the smoothingparameter.dfλ = 5: The spline under fits, and clearly trims down the hills and fills inthe valleys. This leads to a bias that is most dramatic in regions ofhigh curvature. The standard error band is very narrow, so we estimate a badly biased version of the true function with great reliability!dfλ = 9: Here the fitted function is close to the true function, although aslight amount of bias seems evident. The variance has not increasedappreciably.dfλ = 15: The fitted function is somewhat wiggly, but close to the truefunction. The wiggliness also accounts for the increased width of thestandard error bands—the curve is starting to follow some individualpoints too closely.Note that in these figures we are seeing a single realization of data andhence fitted spline fˆ in each case, while the bias involves an expectationE(fˆ). We leave it as an exercise (5.10) to compute similar figures where thebias is shown as well.

The middle curve seems “just right,” in that it hasachieved a good compromise between bias and variance.The integrated squared prediction error (EPE) combines both bias andvariance in a single summary:EPE(fˆλ )===E(Y − fˆλ (X))2hiVar(Y ) + E Bias2 (fˆλ (X)) + Var(fˆλ (X))σ 2 + MSE(fˆλ ).(5.25)Note that this is averaged both over the training sample (giving rise to fˆλ ),and the values of the (independently chosen) prediction points (X, Y ). EPEis a natural quantity of interest, and does create a tradeoff between biasand variance. The blue points in the top left panel of Figure 5.9 suggestthat dfλ = 9 is spot on!Since we don’t know the true function, we do not have access to EPE, andneed an estimate.

This topic is discussed in some detail in Chapter 7, andtechniques such as K-fold cross-validation, GCV and Cp are all in commonuse. In Figure 5.9 we include the N -fold (leave-one-out) cross-validationcurve:5.6 Nonparametric Logistic RegressionCV(fˆλ )==N1 X(−i)(yi − fˆλ (xi ))2N i=1!2N1 X yi − fˆλ (xi ),N i=1 1 − Sλ (i, i)161(5.26)(5.27)which can (remarkably) be computed for each value of λ from the originalfitted values and the diagonal elements Sλ (i, i) of Sλ (Exercise 5.13).The EPE and CV curves have a similar shape, but the entire CV curveis above the EPE curve.

For some realizations this is reversed, and overallthe CV curve is approximately unbiased as an estimate of the EPE curve.5.6 Nonparametric Logistic RegressionThe smoothing spline problem (5.9) in Section 5.4 is posed in a regressionsetting. It is typically straightforward to transfer this technology to otherdomains. Here we consider logistic regression with a single quantitativeinput X. The model islogPr(Y = 1|X = x)= f (x),Pr(Y = 0|X = x)(5.28)ef (x).1 + ef (x)(5.29)which impliesPr(Y = 1|X = x) =Fitting f (x) in a smooth fashion leads to a smooth estimate of the conditional probability Pr(Y = 1|x), which can be used for classification or riskscoring.We construct the penalized log-likelihood criterionℓ(f ; λ)==Z1[yi log p(xi ) + (1 − yi ) log(1 − p(xi ))] − λ {f ′′ (t)}2 dt2i=1N hi 1 ZXyi f (xi ) − log(1 + ef (xi ) ) − λ {f ′′ (t)}2 dt,(5.30)2i=1NXwhere we have abbreviated p(x) = Pr(Y = 1|x).

The first term in this expression is the log-likelihood based on the binomial distribution (c.f. Chapter 4, page 120). Arguments similar to those used in Section 5.4 show thatthe optimal f is a finite-dimensional natural spline with knots at the unique1625. Basis Expansions and Regularizationvalues of x. This means that we can represent f (x) =compute the first and second derivatives∂ℓ(θ)∂θ∂ 2 ℓ(θ)∂θ∂θTPNj=1Nj (x)θj . We=NT (y − p) − λΩθ,(5.31)=−NT WN − λΩ,(5.32)where p is the N -vector with elements p(xi ), and W is a diagonal matrixof weights p(xi )(1 − p(xi )).

The first derivative (5.31) is nonlinear in θ, sowe need to use an iterative algorithm as in Section 4.4.1. Using Newton–Raphson as in (4.23) and (4.26) for linear logistic regression, the updateequation can be writtenθnew = (NT WN + λΩ)−1 NT W Nθold + W−1 (y − p)=(NT WN + λΩ)−1 NT Wz.(5.33)We can also express this update in terms of the fitted valuesf new==N(NT WN + λΩ)−1 NT W f old + W−1 (y − p)Sλ,w z.(5.34)Referring back to (5.12) and (5.14), we see that the update fits a weightedsmoothing spline to the working response z (Exercise 5.12).The form of (5.34) is suggestive. It is tempting to replace Sλ,w by anynonparametric (weighted) regression operator, and obtain general families of nonparametric logistic regression models. Although here x is onedimensional, this procedure generalizes naturally to higher-dimensional x.These extensions are at the heart of generalized additive models, which wepursue in Chapter 9.5.7 Multidimensional SplinesSo far we have focused on one-dimensional spline models.

Each of the approaches have multidimensional analogs. Suppose X ∈ IR2 , and we havea basis of functions h1k (X1 ), k = 1, . . . , M1 for representing functions ofcoordinate X1 , and likewise a set of M2 functions h2k (X2 ) for coordinateX2 . Then the M1 × M2 dimensional tensor product basis defined bygjk (X) = h1j (X1 )h2k (X2 ), j = 1, . . . , M1 , k = 1, .

. . , M2(5.35)can be used for representing a two-dimensional function:g(X) =M2M1 XXj=1 k=1θjk gjk (X).(5.36)5.7 Multidimensional Splines163FIGURE 5.10. A tensor product basis of B-splines, showing some selected pairs.Each two-dimensional function is the tensor product of the corresponding onedimensional marginals.Figure 5.10 illustrates a tensor product basis using B-splines. The coefficients can be fit by least squares, as before. This can be generalized to ddimensions, but note that the dimension of the basis grows exponentiallyfast—yet another manifestation of the curse of dimensionality.

The MARSprocedure discussed in Chapter 9 is a greedy forward algorithm for including only those tensor products that are deemed necessary by least squares.Figure 5.11 illustrates the difference between additive and tensor product(natural) splines on the simulated classification example from Chapter 2.A logistic regression model logit[Pr(T |x)] = h(x)T θ is fit to the binary response, and the estimated decision boundary is the contour h(x)T θ̂ = 0.The tensor product basis can achieve more flexibility at the decision boundary, but introduces some spurious structure along the way.1645.

Basis Expansions and RegularizationAdditive Natural Cubic Splines - 4 df eacho..... ..... o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.