The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 8

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 8 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 82020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 8)

.. .. .. .. .. .. .. o.................. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..... .. .. .. .. .. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..oFIGURE 2.1. A classification example in two dimensions. The classes are codedas a binary variable (BLUE = 0, ORANGE = 1), and then fit by linear regression.The line is the decision boundary defined by xT β̂ = 0.5.

The orange shaded regiondenotes that part of input space classified as ORANGE, while the blue region isclassified as BLUE.The set of points in IR2 classified as ORANGE corresponds to {x : xT β̂ > 0.5},indicated in Figure 2.1, and the two predicted classes are separated by thedecision boundary {x : xT β̂ = 0.5}, which is linear in this case.

We seethat for these data there are several misclassifications on both sides of thedecision boundary. Perhaps our linear model is too rigid— or are such errorsunavoidable? Remember that these are errors on the training data itself,and we have not said where the constructed data came from.

Consider thetwo possible scenarios:Scenario 1: The training data in each class were generated from bivariateGaussian distributions with uncorrelated components and differentmeans.Scenario 2: The training data in each class came from a mixture of 10 lowvariance Gaussian distributions, with individual means themselvesdistributed as Gaussian.A mixture of Gaussians is best described in terms of the generativemodel. One first generates a discrete variable that determines which of142. Overview of Supervised Learningthe component Gaussians to use, and then generates an observation fromthe chosen density. In the case of one Gaussian per class, we will see inChapter 4 that a linear decision boundary is the best one can do, and thatour estimate is almost optimal.

The region of overlap is inevitable, andfuture data to be predicted will be plagued by this overlap as well.In the case of mixtures of tightly clustered Gaussians the story is different. A linear decision boundary is unlikely to be optimal, and in fact isnot. The optimal decision boundary is nonlinear and disjoint, and as suchwill be much more difficult to obtain.We now look at another classification and regression procedure that isin some sense at the opposite end of the spectrum to the linear model, andfar better suited to the second scenario.2.3.2 Nearest-Neighbor MethodsNearest-neighbor methods use those observations in the training set T closest in input space to x to form Ŷ . Specifically, the k-nearest neighbor fitfor Ŷ is defined as follows:1 XŶ (x) =yi ,(2.8)kxi ∈Nk (x)where Nk (x) is the neighborhood of x defined by the k closest points xi inthe training sample.

Closeness implies a metric, which for the moment weassume is Euclidean distance. So, in words, we find the k observations withxi closest to x in input space, and average their responses.In Figure 2.2 we use the same training data as in Figure 2.1, and use15-nearest-neighbor averaging of the binary coded response as the methodof fitting. Thus Ŷ is the proportion of ORANGE’s in the neighborhood, andso assigning class ORANGE to Ĝ if Ŷ > 0.5 amounts to a majority vote inthe neighborhood. The colored regions indicate all those points in inputspace classified as BLUE or ORANGE by such a rule, in this case found byevaluating the procedure on a fine grid in input space. We see that thedecision boundaries that separate the BLUE from the ORANGE regions are farmore irregular, and respond to local clusters where one class dominates.Figure 2.3 shows the results for 1-nearest-neighbor classification: Ŷ isassigned the value yℓ of the closest point xℓ to x in the training data.

Inthis case the regions of classification can be computed relatively easily, andcorrespond to a Voronoi tessellation of the training data. Each point xihas an associated tile bounding the region for which it is the closest inputpoint. For all points x in the tile, Ĝ(x) = gi . The decision boundary is evenmore irregular than before.The method of k-nearest-neighbor averaging is defined in exactly thesame way for regression of a quantitative output Y , although k = 1 wouldbe an unlikely choice.2.3 Least Squares and Nearest Neighbors1515-Nearest Neighbor Classifier..

.... .... .... .... .... .... .... ..... ..... .... .... .... .... .... .... ..... ..... .... .... .... .... .... ..... ..... .... .... .... .... ..... ..... .... .... .... .... .... ..... ..... .... .... .... .... ..... ..... .... .... ......................................................................................................... .... .... .... .... .... .... .... .....

.. .. .. .. .. .. .. .. .. .. .. ... . . . . . . . . . . . . . .o...............................................o .... .... .... ....o.... .... .... .... .... .... .... .... .... .... .... .... .... o.... .... .... ....o.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....

.. .. .. .. .. .. .. .. .. .. .. .. .... .. .. .. o. . . . . . . . . . . . .o.. .. ..o..o. . . . . . . . . . . . .o.............................. .. .. .. ... ... ... ... ...o... ... ... ... ... ... ... ... o.. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ...

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..... .. .. .. .. .. .. .. .. .. ..o. . .. .. ..o.. .. .. ..o....................................o ..... o..... ..... ..... ..... ..... o.. .. .. .. .. .. ..

.. .. .. .. ... ...o.. .. .. .. ..o.. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o o o.... .... .... .... .... .... ..... .....o..... .....

..... ..... ..... ..... ..... .....o..... ..... .....oo..... ..... .....o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....

..... ...... . . . . . . . . . .o. . . . . . . . . . . . . . . . . . . . . .. ..o.. o...........................o .... .... .... .... .... .... .... .... .... .... .... .... .... o.... .... .... .... .... ....o.... .... .... .... ....oo....

.... .... .... .... .... .... ....oo.. ....o.... o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ....... .. .. .. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. ..o.. .. .. .. ..o.. .. o. . .. .. .. ... o.. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. o.. .. o.. .. .. .. .. .. .. .. .. ... ... o..o.. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..... .. .. .. .. .. o. .. .. .. .. .. .. .. .. o. .. .. .. .. .. .. .. .. .. .. ..

.. .. .. .. .. .. .. .. .. o........o... ... ... ... ... ... ... ... ... ... ...o... ... ... ... ... ... ..... .. .. .. .. .. ... o. . . .. .. .. .. .. ...o. . .o. .. ..o.. o. . . . . .. .. .. ..o. .. .. .. ... ... ... ... ... ... ... ...o.. .. .. .. .. .. ..o.. .. .. .. .. .. .. .... .. .. .. .. .. .. ... ... ...o..o.. .. .. .. ..

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.