Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 44

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 44 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 442020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 44)

In this chapter, we shall restrict attention to linear discriminants,namely those for which the decision surfaces are hyperplanes. To simplify the discussion, we consider ﬁrst the case of two classes and then investigate the extensionto K > 2 classes.4.1.1 Two classesThe simplest representation of a linear discriminant function is obtained by taking a linear function of the input vector so thaty(x) = wT x + w0(4.4)where w is called a weight vector, and w0 is a bias (not to be confused with bias inthe statistical sense). The negative of the bias is sometimes called a threshold. Aninput vector x is assigned to class C1 if y(x) 0 and to class C2 otherwise.

The corresponding decision boundary is therefore deﬁned by the relation y(x) = 0, whichcorresponds to a (D − 1)-dimensional hyperplane within the D-dimensional inputspace. Consider two points xA and xB both of which lie on the decision surface.Because y(xA ) = y(xB ) = 0, we have wT (xA − xB ) = 0 and hence the vector w isorthogonal to every vector lying within the decision surface, and so w determines theorientation of the decision surface. Similarly, if x is a point on the decision surface,then y(x) = 0, and so the normal distance from the origin to the decision surface isgiven bywT xw0=−.(4.5)wwWe therefore see that the bias parameter w0 determines the location of the decisionsurface.

These properties are illustrated for the case of D = 2 in Figure 4.1.Furthermore, we note that the value of y(x) gives a signed measure of the perpendicular distance r of the point x from the decision surface. To see this, consider1824. LINEAR MODELS FOR CLASSIFICATIONFigure 4.1 Illustration of the geometry of ax2y>0linear discriminant function in two dimensions.y=0The decision surface, shown in red, is perpenR1dicular to w, and its displacement from the y < 0origin is controlled by the bias parameter w0 .R2Also, the signed orthogonal distance of a general point x from the decision surface is givenby y(x)/w.xwy(x)wx⊥x1−w0wan arbitrary point x and let x⊥ be its orthogonal projection onto the decision surface,so thatw.(4.6)x = x⊥ + rwMultiplying both sides of this result by wT and adding w0 , and making use of y(x) =wT x + w0 and y(x⊥ ) = wT x⊥ + w0 = 0, we haver=y(x).w(4.7)This result is illustrated in Figure 4.1.As with the linear regression models in Chapter 3, it is sometimes convenientto use a more compact notation in which we introduce an additional dummy ‘input’ = (w0 , w) and x = (x0 , x) so thatvalue x0 = 1 and then deﬁne w Tx.y(x) = w(4.8)In this case, the decision surfaces are D-dimensional hyperplanes passing throughthe origin of the D + 1-dimensional expanded input space.4.1.2 Multiple classesNow consider the extension of linear discriminants to K > 2 classes.

We mightbe tempted be to build a K-class discriminant by combining a number of two-classdiscriminant functions. However, this leads to some serious difﬁculties (Duda andHart, 1973) as we now show.Consider the use of K −1 classiﬁers each of which solves a two-class problem ofseparating points in a particular class Ck from points not in that class. This is knownas a one-versus-the-rest classiﬁer. The left-hand example in Figure 4.2 shows an1834.1. Discriminant FunctionsC3C1?R1R1R2C1R3R3C1C2?C2C3R2not C1C2not C2Figure 4.2 Attempting to construct a K class discriminant from a set of two class discriminants leads to ambiguous regions, shown in green. On the left is an example involving the use of two discriminants designed todistinguish points in class Ck from points not in class Ck .

On the right is an example involving three discriminantfunctions each of which is used to separate a pair of classes Ck and Cj .example involving three classes where this approach leads to regions of input spacethat are ambiguously classiﬁed.An alternative is to introduce K(K − 1)/2 binary discriminant functions, onefor every possible pair of classes.

This is known as a one-versus-one classiﬁer. Eachpoint is then classiﬁed according to a majority vote amongst the discriminant functions. However, this too runs into the problem of ambiguous regions, as illustratedin the right-hand diagram of Figure 4.2.We can avoid these difﬁculties by considering a single K-class discriminantcomprising K linear functions of the formyk (x) = wkT x + wk0(4.9)and then assigning a point x to class Ck if yk (x) > yj (x) for all j = k.

The decisionboundary between class Ck and class Cj is therefore given by yk (x) = yj (x) andhence corresponds to a (D − 1)-dimensional hyperplane deﬁned by(wk − wj )T x + (wk0 − wj 0 ) = 0.(4.10)This has the same form as the decision boundary for the two-class case discussed inSection 4.1.1, and so analogous geometrical properties apply.The decision regions of such a discriminant are always singly connected andconvex. To see this, consider two points xA and xB both of which lie inside decision that lies on the line connectingregion Rk , as illustrated in Figure 4.3. Any point xxA and xB can be expressed in the formx = λxA + (1 − λ)xB(4.11)1844.

LINEAR MODELS FOR CLASSIFICATIONFigure 4.3Illustration of the decision regions for a multiclass linear discriminant, with the decisionboundaries shown in red. If two points xAand xB both lie inside the same decision reb that lies on the linegion Rk , then any point xconnecting these two points must also lie inRk , and hence the decision region must besingly connected and convex.RjRiRkxAxBx̂where 0 λ 1. From the linearity of the discriminant functions, it follows thatyk (x) = λyk (xA ) + (1 − λ)yk (xB ).(4.12)Because both xA and xB lie inside Rk , it follows that yk (xA ) > yj (xA ), and) > yj (x), and so x also liesyk (xB ) > yj (xB ), for all j = k, and hence yk (xinside Rk .

Thus Rk is singly connected and convex.Note that for two classes, we can either employ the formalism discussed here,based on two discriminant functions y1 (x) and y2 (x), or else use the simpler butequivalent formulation described in Section 4.1.1 based on a single discriminantfunction y(x).We now explore three approaches to learning the parameters of linear discriminant functions, based on least squares, Fisher’s linear discriminant, and the perceptron algorithm.4.1.3 Least squares for classiﬁcationIn Chapter 3, we considered models that were linear functions of the parameters, and we saw that the minimization of a sum-of-squares error function led to asimple closed-form solution for the parameter values. It is therefore tempting to seeif we can apply the same formalism to classiﬁcation problems.

Consider a generalclassiﬁcation problem with K classes, with a 1-of-K binary coding scheme for thetarget vector t. One justiﬁcation for using least squares in such a context is that itapproximates the conditional expectation E[t|x] of the target values given the inputvector. For the binary coding scheme, this conditional expectation is given by thevector of posterior class probabilities.

Unfortunately, however, these probabilitiesare typically approximated rather poorly, indeed the approximations can have valuesoutside the range (0, 1), due to the limited ﬂexibility of a linear model as we shallsee shortly.Each class Ck is described by its own linear model so thatyk (x) = wkT x + wk0(4.13)where k = 1, . . .

, K. We can conveniently group these together using vector notation so that, Tx(4.14)y(x) = W4.1. Discriminant Functions185, is a matrix whose k th column comprises the D + 1-dimensional vectorwhere Ww k = (wk0 , wkT )T and x is the corresponding augmented input vector (1, xT )T witha dummy input x0 = 1. This representation was discussed in detail in Section 3.1. A kT x is largest.new input x is then assigned to the class for which the output yk = w,We now determine the parameter matrix W by minimizing a sum-of-squareserror function, as we did for regression in Chapter 3. Consider a training data set{xn , tn } where n = 1, .

. . , N , and deﬁne a matrix T whose nth row is the vector tTn,thTn . The sum-of-squares error functiontogether with a matrix X whose n row is xcan then be written as, = 1 Tr (XW, − T)T (XW, − T) .ED (W)(4.15)2, to zero, and rearranging, we then obtain theSetting the derivative with respect to W, in the formsolution for W, = (X T X) −1 X TT = X †TW(4.16) † is the pseudo-inverse of the matrix X, as discussed in Section 3.1.1. Wewhere Xthen obtain the discriminant function in the form T† x, Tx = TT X.(4.17)y(x) = WAn interesting property of least-squares solutions with multiple target variablesis that if every target vector in the training set satisﬁes some linear constraintaT t n + b = 0Exercise 4.2Section 2.3.7(4.18)for some constants a and b, then the model prediction for any value of x will satisfythe same constraint so that(4.19)aT y(x) + b = 0.Thus if we use a 1-of-K coding scheme for K classes, then the predictions madeby the model will have the property that the elements of y(x) will sum to 1 for anyvalue of x.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.