The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377), страница 6

Файл №811377 The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf) 6 страницаThe Elements of Statistical Learning. Data Mining_ Inference_ and Prediction (811377) страница 62020-08-252020-08-25СтудИзба

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 6)

Model assessment and selection is the topic of Chapter 7,covering the concepts of bias and variance, overfitting and methods such ascross-validation for choosing models. Chapter 8 discusses model inferenceand averaging, including an overview of maximum likelihood, Bayesian inference and the bootstrap, the EM algorithm, Gibbs sampling and bagging,A related procedure called boosting is the focus of Chapter 10.In Chapters 9–13 we describe a series of structured methods for supervised learning, with Chapters 9 and 11 covering regression and Chapters 12 and 13 focusing on classification. Chapter 14 describes methods forunsupervised learning.

Two recently proposed techniques, random forestsand ensemble learning, are discussed in Chapters 15 and 16. We describeundirected graphical models in Chapter 17 and finally we study highdimensional problems in Chapter 18.At the end of each chapter we discuss computational considerations important for data mining applications, including how the computations scalewith the number of observations and predictors. Each chapter ends withBibliographic Notes giving background references for the material.81. IntroductionWe recommend that Chapters 1–4 be first read in sequence.

Chapter 7should also be considered mandatory, as it covers central concepts thatpertain to all learning methods. With this in mind, the rest of the bookcan be read sequentially, or sampled, depending on the reader’s interest.indicates a technically difficult section, one that canThe symbolbe skipped without interrupting the flow of the discussion.Book WebsiteThe website for this book is located athttp://www-stat.stanford.edu/ElemStatLearnIt contains a number of resources, including many of the datasets used inthis book.Note for InstructorsWe have successively used the first edition of this book as the basis for atwo-quarter course, and with the additional materials in this second edition,it could even be used for a three-quarter sequence.

Exercises are provided atthe end of each chapter. It is important for students to have access to goodsoftware tools for these topics. We used the R and S-PLUS programminglanguages in our courses.This is page 9Printer: Opaque this2Overview of Supervised Learning2.1 IntroductionThe first three examples described in Chapter 1 have several componentsin common. For each there is a set of variables that might be denoted asinputs, which are measured or preset. These have some influence on one ormore outputs.

For each example the goal is to use the inputs to predict thevalues of the outputs. This exercise is called supervised learning.We have used the more modern language of machine learning. In thestatistical literature the inputs are often called the predictors, a term wewill use interchangeably with inputs, and more classically the independentvariables. In the pattern recognition literature the term features is preferred,which we use as well. The outputs are called the responses, or classicallythe dependent variables.2.2 Variable Types and TerminologyThe outputs vary in nature among the examples. In the glucose predictionexample, the output is a quantitative measurement, where some measurements are bigger than others, and measurements close in value are closein nature. In the famous Iris discrimination example due to R.

A. Fisher,the output is qualitative (species of Iris) and assumes values in a finite setG = {Virginica, Setosa and Versicolor}. In the handwritten digit examplethe output is one of 10 different digit classes: G = {0, 1, . . . , 9}. In both of102. Overview of Supervised Learningthese there is no explicit ordering in the classes, and in fact often descriptive labels rather than numbers are used to denote the classes.

Qualitativevariables are also referred to as categorical or discrete variables as well asfactors.For both types of outputs it makes sense to think of using the inputs topredict the output. Given some specific atmospheric measurements todayand yesterday, we want to predict the ozone level tomorrow. Given thegrayscale values for the pixels of the digitized image of the handwrittendigit, we want to predict its class label.This distinction in output type has led to a naming convention for theprediction tasks: regression when we predict quantitative outputs, and classification when we predict qualitative outputs. We will see that these twotasks have a lot in common, and in particular both can be viewed as a taskin function approximation.Inputs also vary in measurement type; we can have some of each of qualitative and quantitative input variables. These have also led to distinctionsin the types of methods that are used for prediction: some methods aredefined most naturally for quantitative inputs, some most naturally forqualitative and some for both.A third variable type is ordered categorical, such as small, medium andlarge, where there is an ordering between the values, but no metric notionis appropriate (the difference between medium and small need not be thesame as that between large and medium).

These are discussed further inChapter 4.Qualitative variables are typically represented numerically by codes. Theeasiest case is when there are only two classes or categories, such as “success” or “failure,” “survived” or “died.” These are often represented by asingle binary digit or bit as 0 or 1, or else by −1 and 1. For reasons that willbecome apparent, such numeric codes are sometimes referred to as targets.When there are more than two categories, several alternatives are available.The most useful and commonly used coding is via dummy variables. Here aK-level qualitative variable is represented by a vector of K binary variablesor bits, only one of which is “on” at a time.

Although more compact codingschemes are possible, dummy variables are symmetric in the levels of thefactor.We will typically denote an input variable by the symbol X. If X isa vector, its components can be accessed by subscripts Xj . Quantitativeoutputs will be denoted by Y , and qualitative outputs by G (for group).We use uppercase letters such as X, Y or G when referring to the genericaspects of a variable.

Observed values are written in lowercase; hence theith observed value of X is written as xi (where xi is again a scalar orvector). Matrices are represented by bold uppercase letters; for example, aset of N input p-vectors xi , i = 1, . . . , N would be represented by the N ×pmatrix X. In general, vectors will not be bold, except when they have Ncomponents; this convention distinguishes a p-vector of inputs xi for the2.3 Least Squares and Nearest Neighbors11ith observation from the N -vector xj consisting of all the observations onvariable Xj .

Since all vectors are assumed to be column vectors, the ithrow of X is xTi , the vector transpose of xi .For the moment we can loosely state the learning task as follows: giventhe value of an input vector X, make a good prediction of the output Y,denoted by Ŷ (pronounced “y-hat”). If Y takes values in IR then so shouldŶ ; likewise for categorical outputs, Ĝ should take values in the same set Gassociated with G.For a two-class G, one approach is to denote the binary coded targetas Y , and then treat it as a quantitative output. The predictions Ŷ willtypically lie in [0, 1], and we can assign to Ĝ the class label according towhether ŷ > 0.5.

This approach generalizes to K-level qualitative outputsas well.We need data to construct prediction rules, often a lot of it. We thussuppose we have available a set of measurements (xi , yi ) or (xi , gi ), i =1, . . . , N , known as the training data, with which to construct our predictionrule.2.3 Two Simple Approaches to Prediction: LeastSquares and Nearest NeighborsIn this section we develop two simple but powerful prediction methods: thelinear model fit by least squares and the k-nearest-neighbor prediction rule.The linear model makes huge assumptions about structure and yields stablebut possibly inaccurate predictions.

The method of k-nearest neighborsmakes very mild structural assumptions: its predictions are often accuratebut can be unstable.2.3.1 Linear Models and Least SquaresThe linear model has been a mainstay of statistics for the past 30 yearsand remains one of our most important tools. Given a vector of inputsX T = (X1 , X2 , . . . , Xp ), we predict the output Y via the modelŶ = β̂0 +pXXj β̂j .(2.1)j=1The term β̂0 is the intercept, also known as the bias in machine learning.Often it is convenient to include the constant variable 1 in X, include β̂0 inthe vector of coefficients β̂, and then write the linear model in vector formas an inner productŶ = X T β̂,(2.2)122. Overview of Supervised Learningwhere X T denotes vector or matrix transpose (X being a column vector).Here we are modeling a single output, so Ŷ is a scalar; in general Ŷ can bea K–vector, in which case β would be a p × K matrix of coefficients.

In the(p + 1)-dimensional input–output space, (X, Ŷ ) represents a hyperplane.If the constant is included in X, then the hyperplane includes the originand is a subspace; if not, it is an affine set cutting the Y -axis at the point(0, β̂0 ). From now on we assume that the intercept is included in β̂.Viewed as a function over the p-dimensional input space, f (X) = X T βis linear, and the gradient f ′ (X) = β is a vector in input space that pointsin the steepest uphill direction.How do we fit the linear model to a set of training data? There aremany different methods, but by far the most popular is the method ofleast squares.

In this approach, we pick the coefficients β to minimize theresidual sum of squaresRSS(β) =NXi=1(yi − xTi β)2 .(2.3)RSS(β) is a quadratic function of the parameters, and hence its minimumalways exists, but may not be unique. The solution is easiest to characterizein matrix notation. We can writeRSS(β) = (y − Xβ)T (y − Xβ),(2.4)where X is an N × p matrix with each row an input vector, and y is anN -vector of the outputs in the training set. Differentiating w.r.t. β we getthe normal equationsXT (y − Xβ) = 0.(2.5)If XT X is nonsingular, then the unique solution is given byβ̂ = (XT X)−1 XT y,(2.6)and the fitted value at the ith input xi is ŷi = ŷ(xi ) = xTi β̂. At an arbitrary input x0 the prediction is ŷ(x0 ) = xT0 β̂. The entire fitted surface ischaracterized by the p parameters β̂.

Характеристики

Тип файла

PDF-файл

Размер

12,69 Mb

Материал

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Тип материала

Книга

Предмет

(ППП СОиАД) (SAS) Пакеты прикладных программ для статистической обработки и анализа данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

the-elements-of-statistical-learning.-data-mining_-inference_-and-prediction.pdf.rar

The Elements of Statistical Learning. Data Mining_ Inference_ and Prediction.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.