Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 48

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 48 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 482020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 48)

The linear and quadratic decision boundariesare illustrated in Figure 4.11.2004. LINEAR MODELS FOR CLASSIFICATION2.521.510.50−0.5−1−1.5−2−2.5−2−1012Figure 4.11 The left-hand plot shows the class-conditional densities for three classes each having a Gaussiandistribution, coloured red, green, and blue, in which the red and green classes have the same covariance matrix.The right-hand plot shows the corresponding posterior probabilities, in which the RGB colour vector representsthe posterior probabilities for the respective three classes.

The decision boundaries are also shown. Notice thatthe boundary between the red and green classes, which have the same covariance matrix, is linear, whereasthose between the other pairs of classes are quadratic.4.2.2 Maximum likelihood solutionOnce we have speciﬁed a parametric functional form for the class-conditionaldensities p(x|Ck ), we can then determine the values of the parameters, together withthe prior class probabilities p(Ck ), using maximum likelihood.

This requires a dataset comprising observations of x along with their corresponding class labels.Consider ﬁrst the case of two classes, each having a Gaussian class-conditionaldensity with a shared covariance matrix, and suppose we have a data set {xn , tn }where n = 1, . . . , N . Here tn = 1 denotes class C1 and tn = 0 denotes class C2 . Wedenote the prior class probability p(C1 ) = π, so that p(C2 ) = 1 − π. For a data pointxn from class C1 , we have tn = 1 and hencep(xn , C1 ) = p(C1 )p(xn |C1 ) = πN (xn |µ1 , Σ).Similarly for class C2 , we have tn = 0 and hencep(xn , C2 ) = p(C2 )p(xn |C2 ) = (1 − π)N (xn |µ2 , Σ).Thus the likelihood function is given byp(t|π, µ1 , µ2 , Σ) =Ntn[πN (xn |µ1 , Σ)]1−tn[(1 − π)N (xn |µ2 , Σ)](4.71)n=1where t = (t1 , .

. . , tN )T . As usual, it is convenient to maximize the log of thelikelihood function. Consider ﬁrst the maximization with respect to π. The terms in2014.2. Probabilistic Generative Modelsthe log likelihood function that depend on π areN{tn ln π + (1 − tn ) ln(1 − π)} .(4.72)n=1Setting the derivative with respect to π equal to zero and rearranging, we obtainNN11 N1=π=tn =NNN1 + N2(4.73)n=1Exercise 4.9where N1 denotes the total number of data points in class C1 , and N2 denotes the totalnumber of data points in class C2 .

Thus the maximum likelihood estimate for π issimply the fraction of points in class C1 as expected. This result is easily generalizedto the multiclass case where again the maximum likelihood estimate of the priorprobability associated with class Ck is given by the fraction of the training set pointsassigned to that class.Now consider the maximization with respect to µ1 . Again we can pick out ofthe log likelihood function those terms that depend on µ1 givingNn=11tn ln N (xn |µ1 , Σ) = −tn (xn − µ1 )T Σ−1 (xn − µ1 ) + const. (4.74)2Nn=1Setting the derivative with respect to µ1 to zero and rearranging, we obtainN1 µ1 =tn xnN1(4.75)n=1which is simply the mean of all the input vectors xn assigned to class C1 .

By asimilar argument, the corresponding result for µ2 is given byµ2 =N1 (1 − tn )xnN2(4.76)n=1which again is the mean of all the input vectors xn assigned to class C2 .Finally, consider the maximum likelihood solution for the shared covariancematrix Σ. Picking out the terms in the log likelihood function that depend on Σ, wehave11tn ln |Σ| −tn (xn − µ1 )T Σ−1 (xn − µ1 )−22NNn=1n=1NN11(1 − tn ) ln |Σ| −(1 − tn )(xn − µ2 )T Σ−1 (xn − µ2 )−22n=1n=1NN −1 = − ln |Σ| − Tr Σ S(4.77)222024. LINEAR MODELS FOR CLASSIFICATIONwhere we have deﬁnedS =S1 =S2 =N1N2S1 +S2NN1(xn − µ1 )(xn − µ1 )TN1n∈C11 (xn − µ2 )(xn − µ2 )T .N2(4.78)(4.79)(4.80)n∈C2Exercise 4.10Section 2.3.7Using the standard result for the maximum likelihood solution for a Gaussian distribution, we see that Σ = S, which represents a weighted average of the covariancematrices associated with each of the two classes separately.This result is easily extended to the K class problem to obtain the correspondingmaximum likelihood solutions for the parameters in which each class-conditionaldensity is Gaussian with a shared covariance matrix.

Note that the approach of ﬁttingGaussian distributions to the classes is not robust to outliers, because the maximumlikelihood estimation of a Gaussian is not robust.4.2.3 Discrete featuresSection 8.2.2Let us now consider the case of discrete feature values xi .

For simplicity, webegin by looking at binary feature values xi ∈ {0, 1} and discuss the extension tomore general discrete features shortly. If there are D inputs, then a general distribution would correspond to a table of 2D numbers for each class, containing 2D − 1independent variables (due to the summation constraint). Because this grows exponentially with the number of features, we might seek a more restricted representation. Here we will make the naive Bayes assumption in which the feature values aretreated as independent, conditioned on the class Ck . Thus we have class-conditionaldistributions of the formp(x|Ck ) =Dµxkii (1 − µki )1−xi(4.81)i=1which contain D independent parameters for each class.

Substituting into (4.63) thengivesDak (x) ={xi ln µki + (1 − xi ) ln(1 − µki )} + ln p(Ck )(4.82)i=1Exercise 4.11which again are linear functions of the input values xi . For the case of K = 2 classes,we can alternatively consider the logistic sigmoid formulation given by (4.57). Analogous results are obtained for discrete variables each of which can take M > 2states.4.2.4 Exponential familyAs we have seen, for both Gaussian distributed and discrete inputs, the posteriorclass probabilities are given by generalized linear models with logistic sigmoid (K =4.3. Probabilistic Discriminative Models2032 classes) or softmax (K 2 classes) activation functions.

These are particular casesof a more general result obtained by assuming that the class-conditional densitiesp(x|Ck ) are members of the exponential family of distributions.Using the form (2.194) for members of the exponential family, we see that thedistribution of x can be written in the formp(x|λk ) = h(x)g(λk ) exp λT(4.83)k u(x) .We now restrict attention to the subclass of such distributions for which u(x) = x.Then we make use of (2.236) to introduce a scaling parameter s, so that we obtainthe restricted set of exponential family class-conditional densities of the form 111 T(4.84)x g(λk ) expλ x .p(x|λk , s) = hsss kNote that we are allowing each class to have its own parameter vector λk but we areassuming that the classes share the same scale parameter s.For the two-class problem, we substitute this expression for the class-conditionaldensities into (4.58) and we see that the posterior class probability is again given bya logistic sigmoid acting on a linear function a(x) which is given bya(x) = (λ1 − λ2 )T x + ln g(λ1 ) − ln g(λ2 ) + ln p(C1 ) − ln p(C2 ).(4.85)Similarly, for the K-class problem, we substitute the class-conditional density expression into (4.63) to giveak (x) = λTk x + ln g(λk ) + ln p(Ck )(4.86)and so again is a linear function of x.4.3.

Probabilistic Discriminative ModelsFor the two-class classiﬁcation problem, we have seen that the posterior probabilityof class C1 can be written as a logistic sigmoid acting on a linear function of x, for awide choice of class-conditional distributions p(x|Ck ). Similarly, for the multiclasscase, the posterior probability of class Ck is given by a softmax transformation of alinear function of x.

For speciﬁc choices of the class-conditional densities p(x|Ck ),we have used maximum likelihood to determine the parameters of the densities aswell as the class priors p(Ck ) and then used Bayes’ theorem to ﬁnd the posterior classprobabilities.However, an alternative approach is to use the functional form of the generalizedlinear model explicitly and to determine its parameters directly by using maximumlikelihood. We shall see that there is an efﬁcient algorithm ﬁnding such solutionsknown as iterative reweighted least squares, or IRLS.The indirect approach to ﬁnding the parameters of a generalized linear model,by ﬁtting class-conditional densities and class priors separately and then applying2044. LINEAR MODELS FOR CLASSIFICATION11φ2x200.5−10−10x1100.5φ11Figure 4.12 Illustration of the role of nonlinear basis functions in linear classiﬁcation models.

The left plotshows the original input space (x1 , x2 ) together with data points from two classes labelled red and blue. Two‘Gaussian’ basis functions φ1 (x) and φ2 (x) are deﬁned in this space with centres shown by the green crossesand with contours shown by the green circles.

The right-hand plot shows the corresponding feature space(φ1 , φ2 ) together with the linear decision boundary obtained given by a logistic regression model of the formdiscussed in Section 4.3.2. This corresponds to a nonlinear decision boundary in the original input space,shown by the black curve in the left-hand plot.Bayes’ theorem, represents an example of generative modelling, because we couldtake such a model and generate synthetic data by drawing values of x from themarginal distribution p(x).

In the direct approach, we are maximizing a likelihoodfunction deﬁned through the conditional distribution p(Ck |x), which represents aform of discriminative training. One advantage of the discriminative approach isthat there will typically be fewer adaptive parameters to be determined, as we shallsee shortly. It may also lead to improved predictive performance, particularly whenthe class-conditional density assumptions give a poor approximation to the true distributions.4.3.1 Fixed basis functionsSo far in this chapter, we have considered classiﬁcation models that work directly with the original input vector x. However, all of the algorithms are equallyapplicable if we ﬁrst make a ﬁxed nonlinear transformation of the inputs using avector of basis functions φ(x). The resulting decision boundaries will be linear inthe feature space φ, and these correspond to nonlinear decision boundaries in theoriginal x space, as illustrated in Figure 4.12.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.