Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 47

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 47 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 472020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 47)

The perceptron learning algorithm isillustrated in Figure 4.7.If we consider the effect of a single update in the perceptron learning algorithm,we see that the contribution to the error from a misclassiﬁed pattern will be reducedbecause from (4.55) we have−w(τ +1)T φn tn = −w(τ )T φn tn − (φn tn )T φn tn < −w(τ )T φn tn(4.56)where we have set η = 1, and made use of φn tn 2 > 0. Of course, this doesnot imply that the contribution to the error function from the other misclassiﬁedpatterns will have been reduced.

Furthermore, the change in weight vector may havecaused some previously correctly classiﬁed patterns to become misclassiﬁed. Thusthe perceptron learning rule is not guaranteed to reduce the total error function ateach stage.However, the perceptron convergence theorem states that if there exists an exact solution (in other words, if the training data set is linearly separable), then theperceptron learning algorithm is guaranteed to ﬁnd an exact solution in a ﬁnite number of steps.

Proofs of this theorem can be found for example in Rosenblatt (1962),Block (1962), Nilsson (1965), Minsky and Papert (1969), Hertz et al. (1991), andBishop (1995a). Note, however, that the number of steps required to achieve convergence could still be substantial, and in practice, until convergence is achieved,we will not be able to distinguish between a nonseparable problem and one that issimply slow to converge.Even when the data set is linearly separable, there may be many solutions, andwhich one is found will depend on the initialization of the parameters and on the order of presentation of the data points. Furthermore, for data sets that are not linearlyseparable, the perceptron learning algorithm will never converge.4.1.

Discriminant Functions110.50.500−0.5−0.5−1−1−0.500.51−1−1110.50.500−0.5−0.5−1−1−0.500.51−1−1195−0.500.51−0.500.51Figure 4.7 Illustration of the convergence of the perceptron learning algorithm, showing data points from twoclasses (red and blue) in a two-dimensional feature space (φ1 , φ2 ). The top left plot shows the initial parametervector w shown as a black arrow together with the corresponding decision boundary (black line), in which thearrow points towards the decision region which classiﬁed as belonging to the red class. The data point circledin green is misclassiﬁed and so its feature vector is added to the current weight vector, giving the new decisionboundary shown in the top right plot.

The bottom left plot shows the next misclassiﬁed point to be considered,indicated by the green circle, and its feature vector is again added to the weight vector giving the decisionboundary shown in the bottom right plot for which all data points are correctly classiﬁed.1964. LINEAR MODELS FOR CLASSIFICATIONFigure 4.8 Illustration of the Mark 1 perceptron hardware. The photograph on the left shows how the inputswere obtained using a simple camera system in which an input scene, in this case a printed character, wasilluminated by powerful lights, and an image focussed onto a 20 × 20 array of cadmium sulphide photocells,giving a primitive 400 pixel image. The perceptron also had a patch board, shown in the middle photograph,which allowed different conﬁgurations of input features to be tried.

Often these were wired up at random todemonstrate the ability of the perceptron to learn without the need for precise wiring, in contrast to a moderndigital computer. The photograph on the right shows one of the racks of adaptive weights. Each weight wasimplemented using a rotary variable resistor, also called a potentiometer, driven by an electric motor therebyallowing the value of the weight to be adjusted automatically by the learning algorithm.Aside from difﬁculties with the learning algorithm, the perceptron does not provide probabilistic outputs, nor does it generalize readily to K > 2 classes. The mostimportant limitation, however, arises from the fact that (in common with all of themodels discussed in this chapter and the previous one) it is based on linear combinations of ﬁxed basis functions.

More detailed discussions of the limitations ofperceptrons can be found in Minsky and Papert (1969) and Bishop (1995a).Analogue hardware implementations of the perceptron were built by Rosenblatt,based on motor-driven variable resistors to implement the adaptive parameters wj .These are illustrated in Figure 4.8.

The inputs were obtained from a simple camerasystem based on an array of photo-sensors, while the basis functions φ could bechosen in a variety of ways, for example based on simple ﬁxed functions of randomlychosen subsets of pixels from the input image. Typical applications involved learningto discriminate simple shapes or characters.At the same time that the perceptron was being developed, a closely relatedsystem called the adaline, which is short for ‘adaptive linear element’, was beingexplored by Widrow and co-workers. The functional form of the model was the sameas for the perceptron, but a different approach to training was adopted (Widrow andHoff, 1960; Widrow and Lehr, 1990).4.2.

Probabilistic Generative ModelsWe turn next to a probabilistic view of classiﬁcation and show how models withlinear decision boundaries arise from simple assumptions about the distribution ofthe data. In Section 1.5.4, we discussed the distinction between the discriminativeand the generative approaches to classiﬁcation. Here we shall adopt a generative1974.2. Probabilistic Generative ModelsFigure 4.9Plot of the logistic sigmoid function1σ(a) deﬁned by (4.59), shown inred, together with the scaled probit function Φ(λa), for λ2 = π/8,shown in dashed blue, where Φ(a)is deﬁned by (4.114). The scaling factor π/8 is chosen so that thederivatives of the two curves are 0.5equal for a = 0.0−505approach in which we model the class-conditional densities p(x|Ck ), as well as theclass priors p(Ck ), and then use these to compute posterior probabilities p(Ck |x)through Bayes’ theorem.Consider ﬁrst of all the case of two classes.

The posterior probability for classC1 can be written asp(C1 |x) ==p(x|C1 )p(C1 )p(x|C1 )p(C1 ) + p(x|C2 )p(C2 )1= σ(a)1 + exp(−a)where we have deﬁneda = lnp(x|C1 )p(C1 )p(x|C2 )p(C2 )(4.57)(4.58)and σ(a) is the logistic sigmoid function deﬁned byσ(a) =11 + exp(−a)(4.59)which is plotted in Figure 4.9. The term ‘sigmoid’ means S-shaped. This type offunction is sometimes also called a ‘squashing function’ because it maps the wholereal axis into a ﬁnite interval.

The logistic sigmoid has been encountered alreadyin earlier chapters and plays an important role in many classiﬁcation algorithms. Itsatisﬁes the following symmetry propertyσ(−a) = 1 − σ(a)as is easily veriﬁed. The inverse of the logistic sigmoid is given by σ a = ln1−σ(4.60)(4.61)and is known as the logit function. It represents the log of the ratio of probabilitiesln [p(C1 |x)/p(C2 |x)] for the two classes, also known as the log odds.1984.

LINEAR MODELS FOR CLASSIFICATIONNote that in (4.57) we have simply rewritten the posterior probabilities in anequivalent form, and so the appearance of the logistic sigmoid may seem rather vacuous. However, it will have signiﬁcance provided a(x) takes a simple functionalform. We shall shortly consider situations in which a(x) is a linear function of x, inwhich case the posterior probability is governed by a generalized linear model.For the case of K > 2 classes, we havep(Ck |x) ==p(x|Ck )p(Ck )j p(x|Cj )p(Cj )exp(ak )j exp(aj )(4.62)which is known as the normalized exponential and can be regarded as a multiclassgeneralization of the logistic sigmoid.

Here the quantities ak are deﬁned byak = ln p(x|Ck )p(Ck ).(4.63)The normalized exponential is also known as the softmax function, as it representsa smoothed version of the ‘max’ function because, if ak aj for all j = k, thenp(Ck |x) 1, and p(Cj |x) 0.We now investigate the consequences of choosing speciﬁc forms for the classconditional densities, looking ﬁrst at continuous input variables x and then discussing brieﬂy the case of discrete inputs.4.2.1 Continuous inputsLet us assume that the class-conditional densities are Gaussian and then explorethe resulting form for the posterior probabilities. To start with, we shall assume thatall classes share the same covariance matrix.

Thus the density for class Ck is givenby111T −1exp − (x − µk ) Σ (x − µk ) .(4.64)p(x|Ck ) =2(2π)D/2 |Σ|1/2Consider ﬁrst the case of two classes. From (4.57) and (4.58), we havep(C1 |x) = σ(wT x + w0 )(4.65)w = Σ−1 (µ1 − µ2 )11p(C1 )−1−1µ1 + µTµ2 + lnw0 = − µT.1Σ2Σ22p(C2 )(4.66)where we have deﬁned(4.67)We see that the quadratic terms in x from the exponents of the Gaussian densitieshave cancelled (due to the assumption of common covariance matrices) leading toa linear function of x in the argument of the logistic sigmoid.

This result is illustrated for the case of a two-dimensional input space x in Figure 4.10. The resulting4.2. Probabilistic Generative Models199Figure 4.10 The left-hand plot shows the class-conditional densities for two classes, denoted red and blue.On the right is the corresponding posterior probability p(C1 |x), which is given by a logistic sigmoid of a linearfunction of x. The surface in the right-hand plot is coloured using a proportion of red ink given by p(C1 |x) and aproportion of blue ink given by p(C2 |x) = 1 − p(C1 |x).decision boundaries correspond to surfaces along which the posterior probabilitiesp(Ck |x) are constant and so will be given by linear functions of x, and thereforethe decision boundaries are linear in input space.

The prior probabilities p(Ck ) enteronly through the bias parameter w0 so that changes in the priors have the effect ofmaking parallel shifts of the decision boundary and more generally of the parallelcontours of constant posterior probability.For the general case of K classes we have, from (4.62) and (4.63),ak (x) = wkT x + wk0(4.68)where we have deﬁnedwkwk0= Σ−1 µk1= − µTΣ−1 µk + ln p(Ck ).2 k(4.69)(4.70)We see that the ak (x) are again linear functions of x as a consequence of the cancellation of the quadratic terms due to the shared covariances. The resulting decisionboundaries, corresponding to the minimum misclassiﬁcation rate, will occur whentwo of the posterior probabilities (the two largest) are equal, and so will be deﬁnedby linear functions of x, and so again we have a generalized linear model.If we relax the assumption of a shared covariance matrix and allow each classconditional density p(x|Ck ) to have its own covariance matrix Σk , then the earliercancellations will no longer occur, and we will obtain quadratic functions of x, giving rise to a quadratic discriminant.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.