The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 23

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 23 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 232020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 23)

In thismanner, partial least squares produces a sequence of derived, orthogonalinputs or directions z1 , z2 , . . . , zM . As with principal-component regression, if we were to construct all M = p directions, we would get back asolution equivalent to the usual least squares estimates; using M < p directions produces a reduced regression. The procedure is described fully inAlgorithm 3.3.3 Since the x are standardized, the first directions ϕ̂j1j are the univariate regressioncoefficients (up to an irrelevant constant); this is not the case for subsequent directions.3.5 Methods Using Derived Input Directions81Algorithm 3.3 Partial Least Squares.1.

Standardize each xj to have mean zero and variance one. Set ŷ(0) =(0)ȳ1, and xj = xj , j = 1, . . . , p.2. For m = 1, 2, . . . , pPp(m−1)(m−1)(a) zm = j=1 ϕ̂mj xj, where ϕ̂mj = hxj, yi.(b) θ̂m = hzm , yi/hzm , zm i.(c) ŷ(m) = ŷ(m−1) + θ̂m zm .(m−1)(d) Orthogonalize each xj(m−1)[hzm , xji/hzm , zm i]zm ,(m)with respect to zm : xjj = 1, 2, . .

. , p.(m−1)= xj−3. Output the sequence of fitted vectors {ŷ(m) }p1 . Since the {zℓ }m1 are(m)plslinear in the original xj , so is ŷ= Xβ̂ (m). These linear coefficients can be recovered from the sequence of PLS transformations.In the prostate cancer example, cross-validation chose M = 2 PLS directions in Figure 3.7. This produced the model given in the rightmost columnof Table 3.3.What optimization problem is partial least squares solving? Since it usesthe response y to construct its directions, its solution path is a nonlinearfunction of y. It can be shown (Exercise 3.15) that partial least squaresseeks directions that have high variance and have high correlation with theresponse, in contrast to principal components regression which keys onlyon high variance (Stone and Brooks, 1990; Frank and Friedman, 1993).

Inparticular, the mth principal component direction vm solves:maxα Var(Xα)subject to ||α|| = 1, αT Svℓ = 0, ℓ = 1, . . . , m − 1,(3.63)where S is the sample covariance matrix of the xj . The conditions αT Svℓ =0 ensures that zm = Xα is uncorrelated with all the previous linear combinations zℓ = Xvℓ . The mth PLS direction ϕ̂m solves:maxα Corr2 (y, Xα)Var(Xα)subject to ||α|| = 1, αT Sϕ̂ℓ = 0, ℓ = 1, .

. . , m − 1.(3.64)Further analysis reveals that the variance aspect tends to dominate, andso partial least squares behaves much like ridge regression and principalcomponents regression. We discuss this further in the next section.If the input matrix X is orthogonal, then partial least squares finds theleast squares estimates after m = 1 steps. Subsequent steps have no effect823. Linear Methods for Regressionsince the ϕ̂mj are zero for m > 1 (Exercise 3.14). It can also be shown thatthe sequence of PLS coefficients for m = 1, 2, . .

. , p represents the conjugategradient sequence for computing the least squares solutions (Exercise 3.18).3.6 Discussion: A Comparison of the Selection andShrinkage MethodsThere are some simple settings where we can understand better the relationship between the different methods described above. Consider an example with two correlated inputs X1 and X2 , with correlation ρ. We assumethat the true regression coefficients are β1 = 4 and β2 = 2. Figure 3.18shows the coefficient profiles for the different methods, as their tuning parameters are varied. The top panel has ρ = 0.5, the bottom panel ρ = −0.5.The tuning parameters for ridge and lasso vary over a continuous range,while best subset, PLS and PCR take just two discrete steps to the leastsquares solution.

In the top panel, starting at the origin, ridge regressionshrinks the coefficients together until it finally converges to least squares.PLS and PCR show similar behavior to ridge, although are discrete andmore extreme. Best subset overshoots the solution and then backtracks.The behavior of the lasso is intermediate to the other methods.

When thecorrelation is negative (lower panel), again PLS and PCR roughly trackthe ridge path, while all of the methods are more similar to one another.It is interesting to compare the shrinkage behavior of these differentmethods. Recall that ridge regression shrinks all directions, but shrinkslow-variance directions more. Principal components regression leaves Mhigh-variance directions alone, and discards the rest. Interestingly, it canbe shown that partial least squares also tends to shrink the low-variancedirections, but can actually inflate some of the higher variance directions.This can make PLS a little unstable, and cause it to have slightly higherprediction error compared to ridge regression.

A full study is given in Frankand Friedman (1993). These authors conclude that for minimizing prediction error, ridge regression is generally preferable to variable subset selection, principal components regression and partial least squares. Howeverthe improvement over the latter two methods was only slight.To summarize, PLS, PCR and ridge regression tend to behave similarly.Ridge regression may be preferred because it shrinks smoothly, rather thanin discrete steps. Lasso falls somewhere between ridge regression and bestsubset regression, and enjoys some of the properties of each.3.6 Discussion: A Comparison of the Selection and Shrinkage Methods833ρ = 0.5PLSRidge2PCRLeast Squares1Lassoβ2Best Subset0-10•0123456β13ρ = −0.52•β21RidgeLasso0Least SquaresBest Subset0-1PLSPCR0123456β1FIGURE 3.18.

Coefficient profiles from different methods for a simple problem:two inputs with correlation ±0.5, and the true regression coefficients β = (4, 2).843. Linear Methods for Regression3.7 Multiple Outcome Shrinkage and SelectionAs noted in Section 3.2.4, the least squares estimates in a multiple-outputlinear model are simply the individual least squares estimates for each ofthe outputs.To apply selection and shrinkage methods in the multiple output case,one could apply a univariate technique individually to each outcome or simultaneously to all outcomes.

With ridge regression, for example, we couldapply formula (3.44) to each of the K columns of the outcome matrix Y ,using possibly different parameters λ, or apply it to all columns using thesame value of λ. The former strategy would allow different amounts ofregularization to be applied to different outcomes but require estimationof k separate regularization parameters λ1 , . .

. , λk , while the latter wouldpermit all k outputs to be used in estimating the sole regularization parameter λ.Other more sophisticated shrinkage and selection strategies that exploitcorrelations in the different responses can be helpful in the multiple outputcase. Suppose for example that among the outputs we haveYk=f (X) + εk(3.65)Yℓ=f (X) + εℓ ;(3.66)i.e., (3.65) and (3.66) share the same structural part f (X) in their models.It is clear in this case that we should pool our observations on Yk and Ylto estimate the common f .Combining responses is at the heart of canonical correlation analysis(CCA), a data reduction technique developed for the multiple output case.Similar to PCA, CCA finds a sequence of uncorrelated linear combinations Xvm , m = 1, . .

. , M of the xj , and a corresponding sequence ofuncorrelated linear combinations Yum of the responses yk , such that thecorrelationsCorr2 (Yum , Xvm )(3.67)are successively maximized. Note that at most M = min(K, p) directionscan be found. The leading canonical response variates are those linear combinations (derived responses) best predicted by the xj ; in contrast, thetrailing canonical variates can be poorly predicted by the xj , and are candidates for being dropped. The CCA solution is computed using a generalized SVD of the sample cross-covariance matrix YT X/N (assuming Y andX are centered; Exercise 3.20).Reduced-rank regression (Izenman, 1975; van der Merwe and Zidek, 1980)formalizes this approach in terms of a regression model that explicitly poolsinformation.

Given an error covariance Cov(ε) = Σ, we solve the following3.7 Multiple Outcome Shrinkage and Selection85restricted multivariate regression problem:B̂rr (m) = argminNXrank(B)=m i=1(yi − BT xi )T Σ−1 (yi − BT xi ).(3.68)With Σ replaced by the estimate YT Y/N , one can show (Exercise 3.21)that the solution is given by a CCA of Y and X:B̂rr (m) = B̂Um U−m,(3.69)where Um is the K × m sub-matrix of U consisting of the first m columns,and U is the K × M matrix of left canonical vectors u1 , u2 , . .

. , uM . U−mis its generalized inverse. Writing the solution asB̂rr (M ) = (XT X)−1 XT (YUm )U−m,(3.70)we see that reduced-rank regression performs a linear regression on thepooled response matrix YUm , and then maps the coefficients (and hencethe fits as well) back to the original response space. The reduced-rank fitsare given byŶrr (m) = X(XT X)−1 XT YUm U−m= HYPm ,(3.71)where H is the usual linear regression projection operator, and Pm is therank-m CCA response projection operator. Although a better estimate ofΣ would be (Y−XB̂)T (Y−XB̂)/(N −pK), one can show that the solutionremains the same (Exercise 3.22).Reduced-rank regression borrows strength among responses by truncating the CCA.

Breiman and Friedman (1997) explored with some successshrinkage of the canonical variates between X and Y, a smooth version ofreduced rank regression. Their proposal has the form (compare (3.69))B̂c+w = B̂UΛU−1 ,(3.72)where Λ is a diagonal shrinkage matrix (the “c+w” stands for “Curdsand Whey,” the name they gave to their procedure). Based on optimalprediction in the population setting, they show that Λ has diagonal entriesλm =c2m+c2mpN (1− c2m ), m = 1, .

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.