Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 45

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 45 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 452020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 45)

However, this summation constraint alone is not sufﬁcient to allow themodel outputs to be interpreted as probabilities because they are not constrained tolie within the interval (0, 1).The least-squares approach gives an exact closed-form solution for the discriminant function parameters.

However, even as a discriminant function (where we use itto make decisions directly and dispense with any probabilistic interpretation) it suffers from some severe problems. We have already seen that least-squares solutionslack robustness to outliers, and this applies equally to the classiﬁcation application,as illustrated in Figure 4.4. Here we see that the additional data points in the righthand ﬁgure produce a signiﬁcant change in the location of the decision boundary,even though these point would be correctly classiﬁed by the original decision boundary in the left-hand ﬁgure.

The sum-of-squares error function penalizes predictionsthat are ‘too correct’ in that they lie a long way on the correct side of the decision1864. LINEAR MODELS FOR CLASSIFICATION442200−2−2−4−4−6−6−8−8−4−202468−4−202468Figure 4.4 The left plot shows data from two classes, denoted by red crosses and blue circles, together withthe decision boundary found by least squares (magenta curve) and also by the logistic regression model (greencurve), which is discussed later in Section 4.3.2. The right-hand plot shows the corresponding results obtainedwhen extra data points are added at the bottom left of the diagram, showing that least squares is highly sensitiveto outliers, unlike logistic regression.boundary.

In Section 7.1.2, we shall consider several alternative error functions forclassiﬁcation and we shall see that they do not suffer from this difﬁculty.However, problems with least squares can be more severe than simply lack ofrobustness, as illustrated in Figure 4.5. This shows a synthetic data set drawn fromthree classes in a two-dimensional input space (x1 , x2 ), having the property that linear decision boundaries can give excellent separation between the classes. Indeed,the technique of logistic regression, described later in this chapter, gives a satisfactory solution as seen in the right-hand plot.

However, the least-squares solution givespoor results, with only a small region of the input space assigned to the green class.The failure of least squares should not surprise us when we recall that it corresponds to maximum likelihood under the assumption of a Gaussian conditionaldistribution, whereas binary target vectors clearly have a distribution that is far fromGaussian. By adopting more appropriate probabilistic models, we shall obtain classiﬁcation techniques with much better properties than least squares.

For the moment,however, we continue to explore alternative nonprobabilistic methods for setting theparameters in the linear classiﬁcation models.4.1.4 Fisher’s linear discriminantOne way to view a linear classiﬁcation model is in terms of dimensionalityreduction. Consider ﬁrst the case of two classes, and suppose we take the D-1874.1. Discriminant Functions66442200−2−2−4−4−6−6−4−2024−6−66−4−20246Figure 4.5 Example of a synthetic data set comprising three classes, with training data points denoted in red(×), green (+), and blue (◦). Lines denote the decision boundaries, and the background colours denote therespective classes of the decision regions. On the left is the result of using a least-squares discriminant. We seethat the region of input space assigned to the green class is too small and so most of the points from this classare misclassiﬁed.

On the right is the result of using logistic regressions as described in Section 4.3.2 showingcorrect classiﬁcation of the training data.dimensional input vector x and project it down to one dimension usingy = wT x.(4.20)If we place a threshold on y and classify y −w0 as class C1 , and otherwise classC2 , then we obtain our standard linear classiﬁer discussed in the previous section.In general, the projection onto one dimension leads to a considerable loss of information, and classes that are well separated in the original D-dimensional space maybecome strongly overlapping in one dimension.

However, by adjusting the components of the weight vector w, we can select a projection that maximizes the classseparation. To begin with, consider a two-class problem in which there are N1 pointsof class C1 and N2 points of class C2 , so that the mean vectors of the two classes aregiven by1 1 xn ,m2 =xn .(4.21)m1 =N1N2n ∈ C1n ∈ C2The simplest measure of the separation of the classes, when projected onto w, is theseparation of the projected class means. This suggests that we might choose w so asto maximize(4.22)m2 − m1 = wT (m2 − m1 )wheremk = w T mk(4.23)1884. LINEAR MODELS FOR CLASSIFICATION442200−2−2−226−226Figure 4.6 The left plot shows samples from two classes (depicted in red and blue) along with the histogramsresulting from projection onto the line joining the class means.

Note that there is considerable class overlap inthe projected space. The right plot shows the corresponding projection based on the Fisher linear discriminant,showing the greatly improved class separation.Appendix EExercise 4.4is the mean of the projected data from class Ck . However, this expression can bemade arbitrarily large simply by increasing the magnitude ofw. To solve thisproblem, we could constrain w to have unit length, so that i wi2 = 1.

Usinga Lagrange multiplier to perform the constrained maximization, we then ﬁnd thatw ∝ (m2 − m1 ). There is still a problem with this approach, however, as illustratedin Figure 4.6. This shows two classes that are well separated in the original twodimensional space (x1 , x2 ) but that have considerable overlap when projected ontothe line joining their means. This difﬁculty arises from the strongly nondiagonalcovariances of the class distributions.

The idea proposed by Fisher is to maximizea function that will give a large separation between the projected class means whilealso giving a small variance within each class, thereby minimizing the class overlap.The projection formula (4.20) transforms the set of labelled data points in xinto a labelled set in the one-dimensional space y. The within-class variance of thetransformed data from class Ck is therefore given bys2k =(yn − mk )2(4.24)n∈Ckwhere yn = wT xn . We can deﬁne the total within-class variance for the wholedata set to be simply s21 + s22 .

The Fisher criterion is deﬁned to be the ratio of thebetween-class variance to the within-class variance and is given byJ(w) =Exercise 4.5(m2 − m1 )2.s21 + s22(4.25)We can make the dependence on w explicit by using (4.20), (4.23), and (4.24) torewrite the Fisher criterion in the form1894.1. Discriminant FunctionsJ(w) =w T SB ww T SW w(4.26)where SB is the between-class covariance matrix and is given bySB = (m2 − m1 )(m2 − m1 )Tand SW is the total within-class covariance matrix, given by(xn − m1 )(xn − m1 )T +(xn − m2 )(xn − m2 )T .SW =n∈C1(4.27)(4.28)n∈C2Differentiating (4.26) with respect to w, we ﬁnd that J(w) is maximized when(wT SB w)SW w = (wT SW w)SB w.(4.29)From (4.27), we see that SB w is always in the direction of (m2 −m1 ).

Furthermore,we do not care about the magnitude of w, only its direction, and so we can drop the1scalar factors (wT SB w) and (wT SW w). Multiplying both sides of (4.29) by S−Wwe then obtain1(4.30)w ∝ S−W (m2 − m1 ).Note that if the within-class covariance is isotropic, so that SW is proportional to theunit matrix, we ﬁnd that w is proportional to the difference of the class means, asdiscussed above.The result (4.30) is known as Fisher’s linear discriminant, although strictly itis not a discriminant but rather a speciﬁc choice of direction for projection of thedata down to one dimension. However, the projected data can subsequently be usedto construct a discriminant, by choosing a threshold y0 so that we classify a newpoint as belonging to C1 if y(x) y0 and classify it as belonging to C2 otherwise.For example, we can model the class-conditional densities p(y|Ck ) using Gaussiandistributions and then use the techniques of Section 1.2.4 to ﬁnd the parametersof the Gaussian distributions by maximum likelihood.

Having found Gaussian approximations to the projected classes, the formalism of Section 1.5.1 then gives anexpression for the optimal threshold. Some justiﬁcation for the Gaussian assumptioncomes from the central limit theorem by noting that y = wT x is the sum of a set ofrandom variables.4.1.5 Relation to least squaresThe least-squares approach to the determination of a linear discriminant wasbased on the goal of making the model predictions as close as possible to a set oftarget values.

By contrast, the Fisher criterion was derived by requiring maximumclass separation in the output space. It is interesting to see the relationship betweenthese two approaches. In particular, we shall show that, for the two-class problem,the Fisher criterion can be obtained as a special case of least squares.So far we have considered 1-of-K coding for the target values. If, however, weadopt a slightly different target coding scheme, then the least-squares solution for1904.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.