Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 46

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 46 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 462020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 46)

LINEAR MODELS FOR CLASSIFICATIONthe weights becomes equivalent to the Fisher solution (Duda and Hart, 1973). Inparticular, we shall take the targets for class C1 to be N/N1 , where N1 is the numberof patterns in class C1 , and N is the total number of patterns. This target valueapproximates the reciprocal of the prior probability for class C1 .

For class C2 , weshall take the targets to be −N/N2 , where N2 is the number of patterns in class C2 .The sum-of-squares error function can be written21 Tw xn + w0 − tn .2NE=(4.31)n=1Setting the derivatives of E with respect to w0 and w to zero, we obtain respectivelyNwT xn + w0 − tn= 0(4.32)= 0.(4.33)n=1NwT xn + w0 − tn xnn=1From (4.32), and making use of our choice of target coding scheme for the tn , weobtain an expression for the bias in the formw0 = −wT mwhere we have usedNtn = N 1n=1NN− N2=0N1N2(4.34)(4.35)and where m is the mean of the total data set and is given byN11 xn = (N1 m1 + N2 m2 ).m=NN(4.36)n=1Exercise 4.6After some straightforward algebra, and again making use of the choice of tn , thesecond equation (4.33) becomesN1 N2SW +(4.37)SB w = N (m1 − m2 )Nwhere SW is deﬁned by (4.28), SB is deﬁned by (4.27), and we have substituted forthe bias using (4.34).

Using (4.27), we note that SB w is always in the direction of(m2 − m1 ). Thus we can write1w ∝ S−W (m2 − m1 )(4.38)where we have ignored irrelevant scale factors. Thus the weight vector coincideswith that found from the Fisher criterion. In addition, we have also found an expression for the bias value w0 given by (4.34). This tells us that a new vector x should beclassiﬁed as belonging to class C1 if y(x) = wT (x − m) > 0 and class C2 otherwise.4.1. Discriminant Functions1914.1.6 Fisher’s discriminant for multiple classesWe now consider the generalization of the Fisher discriminant to K > 2 classes,and we shall assume that the dimensionality D of the input space is greater than thenumber K of classes.

Next, we introduce D > 1 linear ‘features’ yk = wkT x, wherek = 1, . . . , D . These feature values can conveniently be grouped together to forma vector y. Similarly, the weight vectors {wk } can be considered to be the columnsof a matrix W, so that(4.39)y = WT x.Note that again we are not including any bias parameters in the deﬁnition of y. Thegeneralization of the within-class covariance matrix to the case of K classes followsfrom (4.28) to giveKSk(4.40)SW =k=1whereSk=(xn − mk )(xn − mk )T(4.41)n∈Ckmk1 xnNk=(4.42)n∈Ckand Nk is the number of patterns in class Ck . In order to ﬁnd a generalization of thebetween-class covariance matrix, we follow Duda and Hart (1973) and consider ﬁrstthe total covariance matrixST =N(xn − m)(xn − m)T(4.43)n=1where m is the mean of the total data setm=NK1 1 xn =Nk mkNNn=1(4.44)k=1and N = k Nk is the total number of data points.

The total covariance matrix canbe decomposed into the sum of the within-class covariance matrix, given by (4.40)and (4.41), plus an additional matrix SB , which we identify as a measure of thebetween-class covariance(4.45)ST = SW + S BwhereSB =Kk=1Nk (mk − m)(mk − m)T .(4.46)1924. LINEAR MODELS FOR CLASSIFICATIONThese covariance matrices have been deﬁned in the original x-space. We can nowdeﬁne similar matrices in the projected D -dimensional y-spacesW =K (yn − µk )(yn − µk )T(4.47)k=1 n∈CkandsB =KNk (µk − µ)(µk − µ)T(4.48)k=1where1 µk =yn ,Nkn∈CkK1 µ=Nk µk .N(4.49)k=1Again we wish to construct a scalar that is large when the between-class covarianceis large and when the within-class covariance is small.

There are now many possiblechoices of criterion (Fukunaga, 1990). One example is given by 1 (4.50)J(W) = Tr s−W sB .This criterion can then be rewritten as an explicit function of the projection matrixW in the form(4.51)J(w) = Tr (WSW WT )−1 (WSB WT ) .Maximization of such criteria is straightforward, though somewhat involved, and isdiscussed at length in Fukunaga (1990).

The weight values are determined by those1eigenvectors of S−W SB that correspond to the D largest eigenvalues.There is one important result that is common to all such criteria, which is worthemphasizing. We ﬁrst note from (4.46) that SB is composed of the sum of K matrices, each of which is an outer product of two vectors and therefore of rank 1. Inaddition, only (K − 1) of these matrices are independent as a result of the constraint(4.44). Thus, SB has rank at most equal to (K − 1) and so there are at most (K − 1)nonzero eigenvalues. This shows that the projection onto the (K − 1)-dimensionalsubspace spanned by the eigenvectors of SB does not alter the value of J(w), andso we are therefore unable to ﬁnd more than (K − 1) linear ‘features’ by this means(Fukunaga, 1990).4.1.7 The perceptron algorithmAnother example of a linear discriminant model is the perceptron of Rosenblatt(1962), which occupies an important place in the history of pattern recognition algorithms.

It corresponds to a two-class model in which the input vector x is ﬁrsttransformed using a ﬁxed nonlinear transformation to give a feature vector φ(x),and this is then used to construct a generalized linear model of the form(4.52)y(x) = f wT φ(x)4.1. Discriminant Functions193where the nonlinear activation function f (·) is given by a step function of the form+1, a 0(4.53)f (a) =−1, a < 0.The vector φ(x) will typically include a bias component φ0 (x) = 1.

In earlierdiscussions of two-class classiﬁcation problems, we have focussed on a target codingscheme in which t ∈ {0, 1}, which is appropriate in the context of probabilisticmodels. For the perceptron, however, it is more convenient to use target valuest = +1 for class C1 and t = −1 for class C2 , which matches the choice of activationfunction.The algorithm used to determine the parameters w of the perceptron can mosteasily be motivated by error function minimization. A natural choice of error function would be the total number of misclassiﬁed patterns. However, this does not leadto a simple learning algorithm because the error is a piecewise constant functionof w, with discontinuities wherever a change in w causes the decision boundary tomove across one of the data points.

Methods based on changing w using the gradient of the error function cannot then be applied, because the gradient is zero almosteverywhere.We therefore consider an alternative error function known as the perceptron criterion. To derive this, we note that we are seeking a weight vector w such thatpatterns xn in class C1 will have wT φ(xn ) > 0, whereas patterns xn in class C2have wT φ(xn ) < 0. Using the t ∈ {−1, +1} target coding scheme it follows thatwe would like all patterns to satisfy wT φ(xn )tn > 0. The perceptron criterionassociates zero error with any pattern that is correctly classiﬁed, whereas for a misclassiﬁed pattern xn it tries to minimize the quantity −wT φ(xn )tn . The perceptroncriterion is therefore given bywT φn tn(4.54)EP (w) = −n∈MFrank Rosenblatt1928–1969Rosenblatt’s perceptron played animportant role in the history of machine learning.

Initially, Rosenblattsimulated the perceptron on an IBM704 computer at Cornell in 1957,but by the early 1960s he had builtspecial-purpose hardware that provided a direct, parallel implementation of perceptron learning. Many ofhis ideas were encapsulated in “Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms” published in 1962.

Rosenblatt’s work wascriticized by Marvin Minksy, whose objections werepublished in the book “Perceptrons”, co-authored withSeymour Papert. This book was widely misinterpreted at the time as showing that neural networkswere fatally ﬂawed and could only learn solutions forlinearly separable problems. In fact, it only provedsuch limitations in the case of single-layer networkssuch as the perceptron and merely conjectured (incorrectly) that they applied to more general networkmodels. Unfortunately, however, this book contributedto the substantial decline in research funding for neural computing, a situation that was not reversed until the mid-1980s.

Today, there are many hundreds,if not thousands, of applications of neural networksin widespread use, with examples in areas such ashandwriting recognition and information retrieval being used routinely by millions of people.1944. LINEAR MODELS FOR CLASSIFICATIONSection 3.1.3where M denotes the set of all misclassiﬁed patterns. The contribution to the errorassociated with a particular misclassiﬁed pattern is a linear function of w in regionsof w space where the pattern is misclassiﬁed and zero in regions where it is correctlyclassiﬁed. The total error function is therefore piecewise linear.We now apply the stochastic gradient descent algorithm to this error function.The change in the weight vector w is then given byw(τ +1) = w(τ ) − η∇EP (w) = w(τ ) + ηφn tn(4.55)where η is the learning rate parameter and τ is an integer that indexes the steps ofthe algorithm.

Because the perceptron function y(x, w) is unchanged if we multiplyw by a constant, we can set the learning rate parameter η equal to 1 without ofgenerality. Note that, as the weight vector evolves during training, the set of patternsthat are misclassiﬁed will change.The perceptron learning algorithm has a simple interpretation, as follows. Wecycle through the training patterns in turn, and for each pattern xn we evaluate theperceptron function (4.52). If the pattern is correctly classiﬁed, then the weightvector remains unchanged, whereas if it is incorrectly classiﬁed, then for class C1we add the vector φ(xn ) onto the current estimate of weight vector w while forclass C2 we subtract the vector φ(xn ) from w.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.