The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 42

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 42 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 422020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 42)

It shares many properties with theone-dimensional cubic smoothing spline:• as λ → 0, the solution approaches an interpolating function [the onewith smallest penalty (5.38)];• as λ → ∞, the solution approaches the least squares plane;• for intermediate values of λ, the solution can be represented as alinear expansion of basis functions, whose coefficients are obtainedby a form of generalized ridge regression.The solution has the formf (x) = β0 + β T x +NXαj hj (x),(5.39)j=1where hj (x) = ||x − xj ||2 log ||x − xj ||.

These hj are examples of radialbasis functions, which are discussed in more detail in the next section. Thecoefficients are found by plugging (5.39) into (5.37), which reduces to afinite-dimensional penalized least squares problem. For the penalty to befinite, the coefficients αj have to satisfy a set of linear constraints; seeExercise 5.14.Thin-plate splines are defined more generally for arbitrary dimension d,for which an appropriately more general J is used.There are a number of hybrid approaches that are popular in practice,both for computational and conceptual simplicity. Unlike one-dimensionalsmoothing splines, the computational complexity for thin-plate splines isO(N 3 ), since there is not in general any sparse structure that can be exploited.

However, as with univariate smoothing splines, we can get awaywith substantially less than the N knots prescribed by the solution (5.39).1665. Basis Expansions and RegularizationSystolic Blood Pressure16045•••••••••15540••••••••••••Obesity35302520••••••••150••••• • •••••155••••• • •• •• ••••••• ••••• ••• •• ••••••••••• ••• • •• • • • •• •••• •• • ••• • •• • • • • • • • • • • • •• • •• •• •• •••• •• • • •• • •••••• •• • •• ••• ••• •• • •• • • • • • • • •• • • •• •• •• •• • •• •• •• • • ••• • ••• • ••••••••• • •• • • • • • ••••• • •• •• • • • • •• •• •• •• • • • • •• • • • •• • • • ••• •••• •••••• • •• •• • • • • •• • • • • • •• • •• • • • • •• • • • • •• ••• • •• •••• •• ••• •• • • • • •• • • • • • • • • • • •• • • •• • • •• •• •• •• •• • • • •• • •• 150••• • • ••• •• • • ••• • •• • • • •• •• • •• • • • •••••••••• •• •••••• •• ••• ••• • • • •• ••145• •• •••• •• ••• • • •••••140• ••••••135•••145•••12515•130•••140135130125•1202030405060AgeFIGURE 5.12.

A thin-plate spline fit to the heart disease data, displayed as acontour plot. The response is systolic blood pressure, modeled as a functionof age and obesity. The data points are indicated, as well as the lattice of pointsused as knots. Care should be taken to use knots from the lattice inside the convexhull of the data (red), and ignore those outside (green).In practice, it is usually sufficient to work with a lattice of knots coveringthe domain. The penalty is computed for the reduced expansion just asbefore. Using K knots reduces the computations to O(N K 2 + K 3 ).

Figure 5.12 shows the result of fitting a thin-plate spline to some heart diseaserisk factors, representing the surface as a contour plot. Indicated are thelocation of the input features, as well as the knots used in the fit. Note thatλ was specified via dfλ = trace(Sλ ) = 15.More generally one can represent f ∈ IRd as an expansion in any arbitrarily large collection of basis functions, and control the complexity by applying a regularizer such as (5.38). For example, we could construct a basisby forming the tensor products of all pairs of univariate smoothing-splinebasis functions as in (5.35), using, for example, the univariate B-splinesrecommended in Section 5.9.2 as ingredients.

This leads to an exponential5.8 Regularization and Reproducing Kernel Hilbert Spaces167growth in basis functions as the dimension increases, and typically we haveto reduce the number of functions per coordinate accordingly.The additive spline models discussed in Chapter 9 are a restricted classof multidimensional splines. They can be represented in this general formulation as well; that is, there exists a penalty J[f ] that guarantees that thesolution has the form f (X) = α + f1 (X1 ) + · · · + fd (Xd ) and that each ofthe functions fj are univariate splines. In this case the penalty is somewhatdegenerate, and it is more natural to assume that f is additive, and thensimply impose an additional penalty on each of the component functions:J[f ]==J(f1 + f2 + · · · + fd )d ZXfj′′ (tj )2 dtj .(5.40)j=1These are naturally extended to ANOVA spline decompositions,XXf (X) = α +fj (Xj ) +fjk (Xj , Xk ) + · · · ,j(5.41)j<kwhere each of the components are splines of the required dimension.

Thereare many choices to be made:• The maximum order of interaction—we have shown up to order 2above.• Which terms to include—not all main effects and interactions arenecessarily needed.• What representation to use—some choices are:– regression splines with a relatively small number of basis functions per coordinate, and their tensor products for interactions;– a complete basis as in smoothing splines, and include appropriate regularizers for each term in the expansion.In many cases when the number of potential dimensions (features) is large,automatic methods are more desirable.

The MARS and MART procedures(Chapters 9 and 10, respectively), both fall into this category.5.8 Regularization and Reproducing KernelHilbert SpacesIn this section we cast splines into the larger context of regularization methods and reproducing kernel Hilbert spaces. This section is quite technicaland can be skipped by the disinterested or intimidated reader.1685. Basis Expansions and RegularizationA general class of regularization problems has the form"N#XL(yi , f (xi )) + λJ(f )minf ∈H(5.42)i=1where L(y, f (x)) is a loss function, J(f ) is a penalty functional, and H isa space of functions on which J(f ) is defined.

Girosi et al. (1995) describequite general penalty functionals of the formZ|f˜(s)|2J(f ) =ds,(5.43)IRd G̃(s)where f˜ denotes the Fourier transform of f , and G̃ is some positive functionthat falls off to zero as ||s|| → ∞. The idea is that 1/G̃ increases the penaltyfor high-frequency components of f .

Under some additional assumptionsthey show that the solutions have the formf (X) =KXαk φk (X) +NXi=1k=1θi G(X − xi ),(5.44)where the φk span the null space of the penalty functional J, and G is theinverse Fourier transform of G̃. Smoothing splines and thin-plate splinesfall into this framework. The remarkable feature of this solution is thatwhile the criterion (5.42) is defined over an infinite-dimensional space, thesolution is finite-dimensional. In the next sections we look at some specificexamples.5.8.1 Spaces of Functions Generated by KernelsAn important subclass of problems of the form (5.42) are generated bya positive definite kernel K(x, y), and the corresponding space of functions HK is called a reproducing kernel Hilbert space (RKHS). The penaltyfunctional J is defined in terms of the kernel as well.

We give a brief andsimplified introduction to this class of models, adapted from Wahba (1990)and Girosi et al. (1995), and nicely summarized in Evgeniou et al. (2000).Let x, y ∈ IRp . We consider the space of functions generated by the linearspan of P{K(·, y), y ∈ IRp )}; i.e arbitrary linear combinations of the formf (x) = m αm K(x, ym ), where each kernel term is viewed as a functionof the first argument, and indexed by the second. Suppose that K has aneigen-expansion∞Xγi φi (x)φi (y)(5.45)K(x, y) =P∞i=12i=1 γiwith γi ≥ 0,< ∞. Elements of HK have an expansion in terms ofthese eigen-functions,∞Xci φi (x),(5.46)f (x) =i=15.8 Regularization and Reproducing Kernel Hilbert Spaces169with the constraint thatdef||f ||2HK =∞Xi=1c2i /γi < ∞,(5.47)where ||f ||HK is the norm induced by K.

The penalty functional in (5.42)for the space HK is defined to be the squared norm J(f ) = ||f ||2HK . Thequantity J(f ) can be interpreted as a generalized ridge penalty, wherefunctions with large eigenvalues in the expansion (5.45) get penalized less,and vice versa.Rewriting (5.42) we have"N#X2L(yi , f (xi )) + λ||f ||HKmin(5.48)f ∈HKi=1or equivalentlymin {cj }∞1NXL(yi ,∞Xcj φj (xi )) + λj=1j=1i=1∞Xc2j /γj  .(5.49)It can be shown (Wahba, 1990, see also Exercise 5.15) that the solutionto (5.48) is finite-dimensional, and has the formf (x) =NXαi K(x, xi ).(5.50)i=1The basis function hi (x) = K(x, xi ) (as a function of the first argument) isknown as the representer of evaluation at xi in HK , since for f ∈ HK , it iseasily seen that hK(·, xi ), f iHK = f (xi ).

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.