The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 38

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 38 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 382020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 38)

The response is the relative change in bone mineral density measured at the spine in adolescents, as a function of age. A separate smoothing splinewas fit to the males and females, with λ ≈ 0.00022. This choice corresponds toabout 12 degrees of freedom.where the Nj (x) are an N -dimensional set of basis functions for representing this family of natural splines (Section 5.2.1 and Exercise 5.4). Thecriterion thus reduces toRSS(θ, λ) = (y − Nθ)T (y − Nθ) + λθT ΩN θ,(5.11)R ′′where {N}ij = Nj (xi ) and {ΩN }jk = Nj (t)Nk′′ (t)dt. The solution iseasily seen to beθ̂ = (NT N + λΩN )−1 NT y,(5.12)a generalized ridge regression.

The fitted smoothing spline is given byfˆ(x)=NXNj (x)θ̂j .(5.13)j=1Efficient computational techniques for smoothing splines are discussed inthe Appendix to this chapter.Figure 5.6 shows a smoothing spline fit to some data on bone mineraldensity (BMD) in adolescents. The response is relative change in spinalBMD over two consecutive visits, typically about one year apart. The dataare color coded by gender, and two separate curves were fit.

This simple5.4 Smoothing Splines153summary reinforces the evidence in the data that the growth spurt forfemales precedes that for males by about two years. In both cases thesmoothing parameter λ was approximately 0.00022; this choice is discussedin the next section.5.4.1 Degrees of Freedom and Smoother MatricesWe have not yet indicated how λ is chosen for the smoothing spline. Laterin this chapter we describe automatic methods using techniques such ascross-validation.

In this section we discuss intuitive ways of prespecifyingthe amount of smoothing.A smoothing spline with prechosen λ is an example of a linear smoother(as in linear operator). This is because the estimated parameters in (5.12)are a linear combination of the yi . Denote by f̂ the N -vector of fitted valuesfˆ(xi ) at the training predictors xi . Thenf̂=N(NT N + λΩN )−1 NT y=Sλ y.(5.14)Again the fit is linear in y, and the finite linear operator Sλ is known asthe smoother matrix. One consequence of this linearity is that the recipefor producing f̂ from y does not depend on y itself; Sλ depends only onthe xi and λ.Linear operators are familiar in more traditional least squares fitting aswell.

Suppose Bξ is a N × M matrix of M cubic-spline basis functionsevaluated at the N training points xi , with knot sequence ξ, and M ≪ N .Then the vector of fitted spline values is given byf̂==Bξ (BTξ Bξ )−1 BTξ yHξ y.(5.15)Here the linear operator Hξ is a projection operator, also known as the hatmatrix in statistics. There are some important similarities and differencesbetween Hξ and Sλ :• Both are symmetric, positive semidefinite matrices.• Hξ Hξ = Hξ (idempotent), while Sλ Sλ Sλ , meaning that the righthand side exceeds the left-hand side by a positive semidefinite matrix.This is a consequence of the shrinking nature of Sλ , which we discussfurther below.• Hξ has rank M , while Sλ has rank N .The expression M = trace(Hξ ) gives the dimension of the projection space,which is also the number of basis functions, and hence the number of parameters involved in the fit.

By analogy we define the effective degrees of1545. Basis Expansions and Regularizationfreedom of a smoothing spline to bedfλ = trace(Sλ ),(5.16)the sum of the diagonal elements of Sλ . This very useful definition allowsus a more intuitive way to parameterize the smoothing spline, and indeedmany other smoothers as well, in a consistent fashion. For example, in Figure 5.6 we specified dfλ = 12 for each of the curves, and the correspondingλ ≈ 0.00022 was derived numerically by solving trace(Sλ ) = 12.

There aremany arguments supporting this definition of degrees of freedom, and wecover some of them here.Since Sλ is symmetric (and positive semidefinite), it has a real eigendecomposition. Before we proceed, it is convenient to rewrite Sλ in theReinsch formSλ = (I + λK)−1 ,(5.17)where K does not depend on λ (Exercise 5.9). Since f̂ = Sλ y solvesmin(y − f )T (y − f ) + λf T Kf ,f(5.18)K is known as the penalty matrix, and indeed a quadratic form in K hasa representation in terms of a weighted sum of squared (divided) seconddifferences.

The eigen-decomposition of Sλ isSλ =NXρk (λ)uk uTk(5.19)1,1 + λdk(5.20)k=1withρk (λ) =and dk the corresponding eigenvalue of K. Figure 5.7 (top) shows the results of applying a cubic smoothing spline to some air pollution data (128observations). Two fits are given: a smoother fit corresponding to a largerpenalty λ and a rougher fit for a smaller penalty. The lower panels represent the eigenvalues (lower left) and some eigenvectors (lower right) of thecorresponding smoother matrices. Some of the highlights of the eigenrepresentation are the following:• The eigenvectors are not affected by changes in λ, and hence the wholefamily of smoothing splines (for a particular sequence x) indexed byλ have the same eigenvectors.PN• Sλ y = k=1 uk ρk (λ)huk , yi, and hence the smoothing spline operates by decomposing y w.r.t. the (complete) basis {uk }, and differentially shrinking the contributions using ρk (λ). This is to be contrasted with a basis-regression method, where the components are5.4 Smoothing Splines15530••20•••0•• • ••••• • •••• •••• • • • •••••• ••• •• • ••• ••• •••• ••••• •• • • ••• ••• • • • •• • • • • • •••••• •••• •••• •••••••••••• •••• • •• •••• • ••••• • •• • • ••••••10Ozone Concentration•-500•50100• • •• • •••••0.60.4df=5df=11••••••0.2Eigenvalues0.81.01.2Daggot Pressure Gradient•••••••• • • • • • • • •• •• •• •• •• •• •• •-0.20.0•51015Order2025-50050100-50050100FIGURE 5.7.

(Top:) Smoothing spline fit of ozone concentration versus Daggotpressure gradient. The two fits correspond to different values of the smoothingparameter, chosen to achieve five and eleven effective degrees of freedom, definedby dfλ = trace(Sλ ). (Lower left:) First 25 eigenvalues for the two smoothing-splinematrices. The first two are exactly 1, and all are ≥ 0. (Lower right:) Third tosixth eigenvectors of the spline smoother matrices.

In each case, uk is plottedagainst x, and as such is viewed as a function of x. The rug at the base of theplots indicate the occurrence of data points. The damped functions represent thesmoothed versions of these functions (using the 5 df smoother).1565. Basis Expansions and Regularizationeither left alone, or shrunk to zero—that is, a projection matrix suchas Hξ above has M eigenvalues equal to 1, and the rest are 0. Forthis reason smoothing splines are referred to as shrinking smoothers,while regression splines are projection smoothers (see Figure 3.17 onpage 80).• The sequence of uk , ordered by decreasing ρk (λ), appear to increasein complexity.

Indeed, they have the zero-crossing behavior of polynomials of increasing degree. Since Sλ uk = ρk (λ)uk , we see how each ofthe eigenvectors themselves are shrunk by the smoothing spline: thehigher the complexity, the more they are shrunk. If the domain of Xis periodic, then the uk are sines and cosines at different frequencies.• The first two eigenvalues are always one, and they correspond to thetwo-dimensional eigenspace of functions linear in x (Exercise 5.11),which are never shrunk.• The eigenvalues ρk (λ) = 1/(1 + λdk ) are an inverse function of theeigenvalues dk of the penalty matrix K, moderated by λ; λ controlsthe rate at which the ρk (λ) decrease to zero. d1 = d2 = 0 and againlinear functions are not penalized.• One can reparametrize the smoothing spline using the basis vectorsuk (the Demmler–Reinsch basis).

In this case the smoothing splinesolvesmin ky − Uθk2 + λθ T Dθ,(5.21)θwhere U has columns uk and D is a diagonal matrix with elementsdk .PN• dfλ = trace(Sλ ) =k=1 ρk (λ). For projection smoothers, all theeigenvalues are 1, each one corresponding to a dimension of the projection subspace.Figure 5.8 depicts a smoothing spline matrix, with the rows ordered withx. The banded nature of this representation suggests that a smoothingspline is a local fitting method, much like the locally weighted regressionprocedures in Chapter 6. The right panel shows in detail selected rows ofS, which we call the equivalent kernels.

As λ → 0, dfλ → N , and Sλ → I,the N -dimensional identity matrix. As λ → ∞, dfλ → 2, and Sλ → H, thehat matrix for linear regression on x.5.5 Automatic Selection of the SmoothingParametersThe smoothing parameters for regression splines encompass the degree ofthe splines, and the number and placement of the knots. For smoothing5.5 Automatic Selection of the Smoothing Parameters157Equivalent KernelsRow 12•••• • • ••• ••• ••••••••••••••••••Smoother Matrix12•Row 25•••••••••••••••••••• ••• ••• •••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •• ••Row 502550•••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••• •• •••••••••• ••• •••••• • ••• •••••••••• •••••••••••••• ••••••••••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••••• •• ••Row 7575100115••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••• ••••••••• • • •••••• •••••••••••••••••• ••••••••••••••Row 100•• •••••••••••••••• •••••••••••••••• ••••••••••••••••••••••••••••••• • • •••••• •••••••••••••••••••••••••••••••••••••••••••• •••••••• •••Row 115••••••••••• •• ••••••••••••••••••••••••••••••••••• • • •••••• ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••FIGURE 5.8.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.