Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 102

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 102 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 1022020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 102)

In other words, for any given (nondegenerate)point in the space of parameter values there will be a further K! − 1 additional pointsall of which give rise to exactly the same distribution. This problem is known as9.2. Mixtures of Gaussians435identiﬁability (Casella and Berger, 2002) and is an important issue when we wish tointerpret the parameter values discovered by a model. Identiﬁability will also arisewhen we discuss models having continuous latent variables in Chapter 12. However,for the purposes of ﬁnding a good density model, it is irrelevant because any of theequivalent solutions is as good as any other.Maximizing the log likelihood function (9.14) for a Gaussian mixture modelturns out to be a more complex problem than for the case of a single Gaussian.

Thedifﬁculty arises from the presence of the summation over k that appears inside thelogarithm in (9.14), so that the logarithm function no longer acts directly on theGaussian. If we set the derivatives of the log likelihood to zero, we will no longerobtain a closed form solution, as we shall see shortly.One approach is to apply gradient-based optimization techniques (Fletcher, 1987;Nocedal and Wright, 1999; Bishop and Nabney, 2008). Although gradient-basedtechniques are feasible, and indeed will play an important role when we discussmixture density networks in Chapter 5, we now consider an alternative approachknown as the EM algorithm which has broad applicability and which will lay thefoundations for a discussion of variational inference techniques in Chapter 10.9.2.2 EM for Gaussian mixturesSection 10.1An elegant and powerful method for ﬁnding maximum likelihood solutions formodels with latent variables is called the expectation-maximization algorithm, or EMalgorithm (Dempster et al., 1977; McLachlan and Krishnan, 1997).

Later we shallgive a general treatment of EM, and we shall also show how EM can be generalizedto obtain the variational inference framework. Initially, we shall motivate the EMalgorithm by giving a relatively informal treatment in the context of the Gaussianmixture model. We emphasize, however, that EM has broad applicability, and indeedit will be encountered in the context of a variety of different models in this book.Let us begin by writing down the conditions that must be satisﬁed at a maximumof the likelihood function.

Setting the derivatives of ln p(X|π, µ, Σ) in (9.14) withrespect to the means µk of the Gaussian components to zero, we obtainNπ N (xn |µk , Σk )kΣk (xn − µk )0=−πj N (xn |µj , Σj )n=1 ( j)*+γ(znk )(9.16)where we have made use of the form (2.43) for the Gaussian distribution. Note thatthe posterior probabilities, or responsibilities, given by (9.13) appear naturally on1the right-hand side. Multiplying by Σ−k (which we assume to be nonsingular) andrearranging we obtainN1 γ(znk )xn(9.17)µk =Nkn=1where we have deﬁnedNk =Nn=1γ(znk ).(9.18)4369.

MIXTURE MODELS AND EMSection 2.3.4We can interpret Nk as the effective number of points assigned to cluster k. Notecarefully the form of this solution. We see that the mean µk for the k th Gaussiancomponent is obtained by taking a weighted mean of all of the points in the data set,in which the weighting factor for data point xn is given by the posterior probabilityγ(znk ) that component k was responsible for generating xn .If we set the derivative of ln p(X|π, µ, Σ) with respect to Σk to zero, and followa similar line of reasoning, making use of the result for the maximum likelihoodsolution for the covariance matrix of a single Gaussian, we obtainN1 Σk =γ(znk )(xn − µk )(xn − µk )TNk(9.19)n=1Appendix Ewhich has the same form as the corresponding result for a single Gaussian ﬁtted tothe data set, but again with each data point weighted by the corresponding posterior probability and with the denominator given by the effective number of pointsassociated with the corresponding component.Finally, we maximize ln p(X|π, µ, Σ) with respect to the mixing coefﬁcientsπk .

Here we must take account of the constraint (9.9), which requires the mixingcoefﬁcients to sum to one. This can be achieved using a Lagrange multiplier andmaximizing the following quantityKπk − 1(9.20)ln p(X|π, µ, Σ) + λk=1which gives0=Nn=1N (xn |µk , Σk )+λj πj N (xn |µj , Σj )(9.21)where again we see the appearance of the responsibilities. If we now multiply bothsides by πk and sum over k making use of the constraint (9.9), we ﬁnd λ = −N .Using this to eliminate λ and rearranging we obtainπk =NkN(9.22)so that the mixing coefﬁcient for the k th component is given by the average responsibility which that component takes for explaining the data points.It is worth emphasizing that the results (9.17), (9.19), and (9.22) do not constitute a closed-form solution for the parameters of the mixture model because theresponsibilities γ(znk ) depend on those parameters in a complex way through (9.13).However, these results do suggest a simple iterative scheme for ﬁnding a solution tothe maximum likelihood problem, which as we shall see turns out to be an instanceof the EM algorithm for the particular case of the Gaussian mixture model.

Weﬁrst choose some initial values for the means, covariances, and mixing coefﬁcients.Then we alternate between the following two updates that we shall call the E step4379.2. Mixtures of Gaussians222L=1000−2−2−2−20(a)22−20(b)22L=2−20−2−2−2(d)220(f)2L = 2000(c)2L=50−20−20(e)2−2Figure 9.8 Illustration of the EM algorithm using the Old Faithful set as used for the illustration of the K-meansalgorithm in Figure 9.1.

See the text for details.Section 9.4and the M step, for reasons that will become apparent shortly. In the expectationstep, or E step, we use the current values for the parameters to evaluate the posteriorprobabilities, or responsibilities, given by (9.13). We then use these probabilities inthe maximization step, or M step, to re-estimate the means, covariances, and mixing coefﬁcients using the results (9.17), (9.19), and (9.22).

Note that in so doingwe ﬁrst evaluate the new means using (9.17) and then use these new values to ﬁndthe covariances using (9.19), in keeping with the corresponding result for a singleGaussian distribution. We shall show that each update to the parameters resultingfrom an E step followed by an M step is guaranteed to increase the log likelihoodfunction. In practice, the algorithm is deemed to have converged when the changein the log likelihood function, or alternatively in the parameters, falls below somethreshold.

We illustrate the EM algorithm for a mixture of two Gaussians applied tothe rescaled Old Faithful data set in Figure 9.8. Here a mixture of two Gaussiansis used, with centres initialized using the same values as for the K-means algorithmin Figure 9.1, and with precision matrices initialized to be proportional to the unitmatrix. Plot (a) shows the data points in green, together with the initial conﬁguration of the mixture model in which the one standard-deviation contours for the two4389.

MIXTURE MODELS AND EMGaussian components are shown as blue and red circles. Plot (b) shows the resultof the initial E step, in which each data point is depicted using a proportion of blueink equal to the posterior probability of having been generated from the blue component, and a corresponding proportion of red ink given by the posterior probabilityof having been generated by the red component.

Thus, points that have a signiﬁcantprobability for belonging to either cluster appear purple. The situation after the ﬁrstM step is shown in plot (c), in which the mean of the blue Gaussian has moved tothe mean of the data set, weighted by the probabilities of each data point belongingto the blue cluster, in other words it has moved to the centre of mass of the blue ink.Similarly, the covariance of the blue Gaussian is set equal to the covariance of theblue ink. Analogous results hold for the red component. Plots (d), (e), and (f) showthe results after 2, 5, and 20 complete cycles of EM, respectively.

In plot (f) thealgorithm is close to convergence.Note that the EM algorithm takes many more iterations to reach (approximate)convergence compared with the K-means algorithm, and that each cycle requiressigniﬁcantly more computation. It is therefore common to run the K-means algorithm in order to ﬁnd a suitable initialization for a Gaussian mixture model that issubsequently adapted using EM. The covariance matrices can conveniently be initialized to the sample covariances of the clusters found by the K-means algorithm,and the mixing coefﬁcients can be set to the fractions of data points assigned to therespective clusters.

As with gradient-based approaches for maximizing the log likelihood, techniques must be employed to avoid singularities of the likelihood functionin which a Gaussian component collapses onto a particular data point. It should beemphasized that there will generally be multiple local maxima of the log likelihoodfunction, and that EM is not guaranteed to ﬁnd the largest of these maxima. Becausethe EM algorithm for Gaussian mixtures plays such an important role, we summarizeit below.EM for Gaussian MixturesGiven a Gaussian mixture model, the goal is to maximize the likelihood functionwith respect to the parameters (comprising the means and covariances of thecomponents and the mixing coefﬁcients).1.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.