The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 26

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 26 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 262020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 26)

This has anexplicit solution, resulting in the update!NX(j)β̃j (λ) ← Sxij (yi − ỹi ), λ .(3.84)i=1Here S(t, λ) = sign(t)(|t|−λ)+ is the soft-thresholding operator in Table 3.4on page 71. The first argument to S(·) is the simple least-squares coefficientof the partial residual on the standardized variable xij . Repeated iterationof (3.84)—cycling through each variable in turn until convergence—yieldsthe lasso estimate β̂(λ).We can also use this simple algorithm to efficiently compute the lassosolutions at a grid of values of λ. We start with the smallest value λmaxfor which β̂(λmax ) = 0, decrease it a little and cycle through the variablesuntil convergence.

Then λ is decreased again and the process is repeated,using the previous solution as a “warm start” for the new value of λ. Thiscan be faster than the LARS algorithm, especially in large problems. Akey to its speed is the fact that the quantities in (3.84) can be updatedquickly as j varies, and often the update is to leave β̃j = 0. On the otherhand, it delivers solutions over a grid of λ values, rather than the entiresolution path.

The same kind of algorithm can be applied to the elasticnet, the grouped lasso and many other models in which the penalty is asum of functions of the individual parameters (Friedman et al., 2010). Itcan also be applied, with some substantial modifications, to the fused lasso(Section 18.4.2); details are in Friedman et al. (2007).3.9 Computational ConsiderationsLeast squares fitting is usually done via the Cholesky decomposition ofthe matrix XT X or a QR decomposition of X.

With N observations and pfeatures, the Cholesky decomposition requires p3 +N p2 /2 operations, whilethe QR decomposition requires N p2 operations. Depending on the relativesize of N and p, the Cholesky can sometimes be faster; on the other hand,it can be less numerically stable (Lawson and Hansen, 1974). Computationof the lasso via the LAR algorithm has the same order of computation asa least squares fit.943. Linear Methods for RegressionBibliographic NotesLinear regression is discussed in many statistics books, for example, Seber(1984), Weisberg (1980) and Mardia et al. (1979).

Ridge regression wasintroduced by Hoerl and Kennard (1970), while the lasso was proposed byTibshirani (1996). Around the same time, lasso-type penalties were proposed in the basis pursuit method for signal processing (Chen et al., 1998).The least angle regression procedure was proposed in Efron et al. (2004);related to this is the earlier homotopy procedure of Osborne et al. (2000a)and Osborne et al. (2000b). Their algorithm also exploits the piecewiselinearity used in the LAR/lasso algorithm, but lacks its transparency.

Thecriterion for the forward stagewise criterion is discussed in Hastie et al.(2007). Park and Hastie (2007) develop a path algorithm similar to leastangle regression for generalized regression models. Partial least squareswas introduced by Wold (1975). Comparisons of shrinkage methods maybe found in Copas (1983) and Frank and Friedman (1993).ExercisesEx. 3.1 Show that the F statistic (3.13) for dropping a single coefficientfrom a model is equal to the square of the corresponding z-score (3.12).Ex. 3.2 Given data on two variables PX and Y , consider fitting a cubic3polynomial regression model f (X) = j=0 βj X j .

In addition to plottingthe fitted curve, you would like a 95% confidence band about the curve.Consider the following two approaches:1. At each pointP3x0 , form a 95% confidence interval for the linear function aT β = j=0 βj xj0 .2. Form a 95% confidence set for β as in (3.15), which in turn generatesconfidence intervals for f (x0 ).How do these approaches differ? Which band is likely to be wider? Conducta small simulation experiment to compare the two methods.Ex. 3.3 Gauss–Markov theorem:(a) Prove the Gauss–Markov theorem: the least squares estimate of aparameter aT β has variance no bigger than that of any other linearunbiased estimate of aT β (Section 3.2.2).(b) The matrix inequality B A holds if A − B is positive semidefinite.Show that if V̂ is the variance-covariance matrix of the least squaresestimate of β and Ṽ is the variance-covariance matrix of any otherlinear unbiased estimate, then V̂ Ṽ.Exercises95Ex.

3.4 Show how the vector of least squares coefficients can be obtainedfrom a single pass of the Gram–Schmidt procedure (Algorithm 3.1). Represent your solution in terms of the QR decomposition of X.Ex. 3.5 Consider the ridge regression problem (3.41). Show that this problem is equivalent to the problem(N)ppXXX22β̂ c = argminyi − β0c −(xij − x̄j )βjc + λ(3.85)βjc .βci=1j=1j=1cGive the correspondence between β and the original β in (3.41). Characterize the solution to this modified criterion.

Show that a similar resultholds for the lasso.Ex. 3.6 Show that the ridge regression estimate is the mean (and mode)of the posterior distribution, under a Gaussian prior β ∼ N (0, τ I), andGaussian sampling model y ∼ N (Xβ, σ 2 I).

Find the relationship betweenthe regularization parameter λ in the ridge formula, and the variances τand σ 2 .Ex. 3.7 Assume yi ∼ N (β0 + xTi β, σ 2 ), i = 1, 2, . . . , N , and the parametersβj , j = 1, . . . , p are each distributed as N (0, τ 2 ), independently of oneanother. Assuming σ 2 and τ 2 are known, and β0 is not governed by aprior (or has a flat improper prior), show that the (minus) log-posteriorPNPPp22density of β is proportional toi=1 (yi − β0 −j xij βj ) + λj=1 βj22where λ = σ /τ .Ex. 3.8 Consider the QR decomposition of the uncentered N × (p + 1)matrix X (whose first column is all ones), and the SVD of the N × pcentered matrix X̃. Show that Q2 and U span the same subspace, whereQ2 is the sub-matrix of Q with the first column removed.

Under whatcircumstances will they be the same, up to sign flips?Ex. 3.9 Forward stepwise regression. Suppose we have the QR decomposition for the N ×q matrix X1 in a multiple regression problem with responsey, and we have an additional p − q predictors in the matrix X2 . Denote thecurrent residual by r. We wish to establish which one of these additionalvariables will reduce the residual-sum-of squares the most when includedwith those in X1 . Describe an efficient procedure for doing this.Ex. 3.10 Backward stepwise regression. Suppose we have the multiple regression fit of y on X, along with the standard errors and Z-scores as inTable 3.2.

We wish to establish which variable, when dropped, will increasethe residual sum-of-squares the least. How would you do this?Ex. 3.11 Show that the solution to the multivariate linear regression problem (3.40) is given by (3.39). What happens if the covariance matrices Σiare different for each observation?963. Linear Methods for RegressionEx.

3.12 Show that the ridge regression estimates can be obtained byordinary least squares regression on an augmenteddata set. We augment√the centered matrix X with p additional rows λI, and augment y with pzeros. By introducing artificial data having response value zero, the fittingprocedure is forced to shrink the coefficients toward zero.

This is related tothe idea of hints due to Abu-Mostafa (1995), where model constraints areimplemented by adding artificial data examples that satisfy them.Ex. 3.13 Derive the expression (3.62), and show that β̂ pcr (p) = β̂ ls .Ex. 3.14 Show that in the orthogonal case, PLS stops after m = 1 steps,because subsequent ϕ̂mj in step 2 in Algorithm 3.3 are zero.Ex.

3.15 Verify expression (3.64), and hence show that the partial leastsquares directions are a compromise between the ordinary regression coefficient and the principal component directions.Ex. 3.16 Derive the entries in Table 3.4, the explicit forms for estimatorsin the orthogonal case.Ex. 3.17 Repeat the analysis of Table 3.3 on the spam data discussed inChapter 1.Ex.

3.18 Read about conjugate gradient algorithms (Murray et al., 1981, forexample), and establish a connection between these algorithms and partialleast squares.Ex. 3.19 Show that kβ̂ ridge k increases as its tuning parameter λ → 0. Doesthe same property hold for the lasso and partial least squares estimates?For the latter, consider the “tuning parameter” to be the successive stepsin the algorithm.Ex.

3.20 Consider the canonical-correlation problem (3.67). Show that theleading pair of canonical variates u1 and v1 solve the problemmaxuT (Y T Y)u=1vT (XT X)v=1uT (YT X)v,(3.86)a generalized SVD problem. Show that the solution is given by u1 =11(YT Y)− 2 u∗1 , and v1 = (XT X)− 2 v1∗ , where u∗1 and v1∗ are the leading leftand right singular vectors in11(YT Y)− 2 (YT X)(XT X)− 2 = U∗ D∗ V∗ T .(3.87)Show that the entire sequence um , vm , m = 1, .

. . , min(K, p) is also givenby (3.87).Ex. 3.21 Show that the solution to the reduced-rank regression problem(3.68), with Σ estimated by YT Y/N , is given by (3.69). Hint: TransformExercises971Y to Y∗ = YΣ− 2 , and solved in terms of the canonical vectors u∗m . Show11∗ T2that Um = Σ− 2 U∗m , and a generalized inverse is U−m = Um Σ .Ex.

3.22 Show that the solution in Exercise 3.21 does not change if Σ isestimated by the more natural quantity (Y − XB̂)T (Y − XB̂)/(N − pK).Ex. 3.23 Consider a regression problem with all variables and response having mean zero and standard deviation one. Suppose also that each variablehas identical absolute correlation with the response:1|hxj , yi| = λ, j = 1, . . . , p.NLet β̂ be the least-squares coefficient of y on X, and let u(α) = αXβ̂ forα ∈ [0, 1] be the vector that moves a fraction α toward the least squares fitu.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.