Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 101

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 101 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 1012020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 101)

The values of zk therefore satisfy zk ∈ {0, 1} and k zk = 1, and wesee that there are K possible states for the vector z according to which element isnonzero. We shall deﬁne the joint distribution p(x, z) in terms of a marginal distribution p(z) and a conditional distribution p(x|z), corresponding to the graphicalmodel in Figure 9.4. The marginal distribution over z is speciﬁed in terms of themixing coefﬁcients πk , such thatp(zk = 1) = πk9.2. Mixtures of GaussiansFigure 9.4Graphical representation of a mixture model, in whichthe joint distribution is expressed in the form p(x, z) =p(z)p(x|z).431zxwhere the parameters {πk } must satisfy0 πk 1together withK(9.8)πk = 1(9.9)k=1in order to be valid probabilities.

Because z uses a 1-of-K representation, we canalso write this distribution in the formp(z) =Kπkzk .(9.10)k=1Similarly, the conditional distribution of x given a particular value for z is a Gaussianp(x|zk = 1) = N (x|µk , Σk )which can also be written in the formp(x|z) =KN (x|µk , Σk )zk .(9.11)k=1Exercise 9.3The joint distribution is given by p(z)p(x|z), and the marginal distribution of x isthen obtained by summing the joint distribution over all possible states of z to givep(x) =zp(z)p(x|z) =Kπk N (x|µk , Σk )(9.12)k=1where we have made use of (9.10) and (9.11). Thus the marginal distribution of x isa Gaussian mixture of the form (9.7).

If we have several observations x1 , . . . , xN ,then, because we have represented the marginal distribution in the form p(x) =z p(x, z), it follows that for every observed data point xn there is a correspondinglatent variable zn .We have therefore found an equivalent formulation of the Gaussian mixture involving an explicit latent variable. It might seem that we have not gained muchby doing so. However, we are now able to work with the joint distribution p(x, z)4329. MIXTURE MODELS AND EMinstead of the marginal distribution p(x), and this will lead to signiﬁcant simpliﬁcations, most notably through the introduction of the expectation-maximization (EM)algorithm.Another quantity that will play an important role is the conditional probabilityof z given x. We shall use γ(zk ) to denote p(zk = 1|x), whose value can be foundusing Bayes’ theoremγ(zk ) ≡ p(zk = 1|x) =p(zk = 1)p(x|zk = 1)Kp(zj = 1)p(x|zj = 1)j =1=πk N (x|µk , Σk )K.(9.13)πj N (x|µj , Σj )j =1Section 8.1.2We shall view πk as the prior probability of zk = 1, and the quantity γ(zk ) as thecorresponding posterior probability once we have observed x.

As we shall see later,γ(zk ) can also be viewed as the responsibility that component k takes for ‘explaining’ the observation x.We can use the technique of ancestral sampling to generate random samplesdistributed according to the Gaussian mixture model. To do this, we ﬁrst generate az, from the marginal distribution p(z) and then generatevalue for z, which we denote a value for x from the conditional distribution p(x|z). Techniques for sampling fromstandard distributions are discussed in Chapter 11. We can depict samples from thejoint distribution p(x, z) by plotting points at the corresponding values of x andthen colouring them according to the value of z, in other words according to whichGaussian component was responsible for generating them, as shown in Figure 9.5(a).Similarly samples from the marginal distribution p(x) are obtained by taking thesamples from the joint distribution and ignoring the values of z.

These are illustratedin Figure 9.5(b) by plotting the x values without any coloured labels.We can also use this synthetic data set to illustrate the ‘responsibilities’ by evaluating, for every data point, the posterior probability for each component in themixture distribution from which this data set was generated. In particular, we canrepresent the value of the responsibilities γ(znk ) associated with data point xn byplotting the corresponding point using proportions of red, blue, and green ink givenby γ(znk ) for k = 1, 2, 3, respectively, as shown in Figure 9.5(c).

So, for instance,a data point for which γ(zn1 ) = 1 will be coloured red, whereas one for whichγ(zn2 ) = γ(zn3 ) = 0.5 will be coloured with equal proportions of blue and greenink and so will appear cyan. This should be compared with Figure 9.5(a) in whichthe data points were labelled using the true identity of the component from whichthey were generated.9.2.1 Maximum likelihoodSuppose we have a data set of observations {x1 , .

. . , xN }, and we wish to modelthis data using a mixture of Gaussians. We can represent this data set as an N × D4339.2. Mixtures of Gaussians11(a)1(b)0.50.50.500000.5100.5(c)100.51Figure 9.5 Example of 500 points drawn from the mixture of 3 Gaussians shown in Figure 2.23. (a) Samplesfrom the joint distribution p(z)p(x|z) in which the three states of z, corresponding to the three components of themixture, are depicted in red, green, and blue, and (b) the corresponding samples from the marginal distributionp(x), which is obtained by simply ignoring the values of z and just plotting the x values. The data set in (a) issaid to be complete, whereas that in (b) is incomplete.

(c) The same samples in which the colours represent thevalue of the responsibilities γ(znk ) associated with data point xn , obtained by plotting the corresponding pointusing proportions of red, blue, and green ink given by γ(znk ) for k = 1, 2, 3, respectivelymatrix X in which the nth row is given by xTn . Similarly, the corresponding latentvariables will be denoted by an N × K matrix Z with rows zTn . If we assume thatthe data points are drawn independently from the distribution, then we can expressthe Gaussian mixture model for this i.i.d. data set using the graphical representationshown in Figure 9.6. From (9.7) the log of the likelihood function is given byKNlnπk N (xn |µk , Σk ) .(9.14)ln p(X|π, µ, Σ) =n=1k=1Before discussing how to maximize this function, it is worth emphasizing thatthere is a signiﬁcant problem associated with the maximum likelihood frameworkapplied to Gaussian mixture models, due to the presence of singularities.

For simplicity, consider a Gaussian mixture whose components have covariance matricesgiven by Σk = σk2 I, where I is the unit matrix, although the conclusions will holdfor general covariance matrices. Suppose that one of the components of the mixturemodel, let us say the j th component, has its mean µj exactly equal to one of the dataFigure 9.6Graphical representation of a Gaussian mixture modelfor a set of N i.i.d. data points {xn }, with correspondingπlatent points {zn }, where n = 1, . .

. , N .znxnµΣN4349. MIXTURE MODELS AND EMFigure 9.7Illustration of how singularities in thelikelihood function arise with mixturesof Gaussians. This should be com- p(x)pared with the case of a single Gaussian shown in Figure 1.14 for which nosingularities arise.xpoints so that µj = xn for some value of n. This data point will then contribute aterm in the likelihood function of the form11.(9.15)N (xn |xn , σj2 I) =(2π)1/2 σjSection 10.1If we consider the limit σj → 0, then we see that this term goes to inﬁnity andso the log likelihood function will also go to inﬁnity.

Thus the maximization ofthe log likelihood function is not a well posed problem because such singularitieswill always be present and will occur whenever one of the Gaussian components‘collapses’ onto a speciﬁc data point. Recall that this problem did not arise in thecase of a single Gaussian distribution. To understand the difference, note that if asingle Gaussian collapses onto a data point it will contribute multiplicative factorsto the likelihood function arising from the other data points and these factors will goto zero exponentially fast, giving an overall likelihood that goes to zero rather thaninﬁnity.

However, once we have (at least) two components in the mixture, one ofthe components can have a ﬁnite variance and therefore assign ﬁnite probability toall of the data points while the other component can shrink onto one speciﬁc datapoint and thereby contribute an ever increasing additive value to the log likelihood.This is illustrated in Figure 9.7. These singularities provide another example of thesevere over-ﬁtting that can occur in a maximum likelihood approach. We shall seethat this difﬁculty does not occur if we adopt a Bayesian approach. For the moment,however, we simply note that in applying maximum likelihood to Gaussian mixturemodels we must take steps to avoid ﬁnding such pathological solutions and insteadseek local maxima of the likelihood function that are well behaved. We can hope toavoid the singularities by using suitable heuristics, for instance by detecting when aGaussian component is collapsing and resetting its mean to a randomly chosen valuewhile also resetting its covariance to some large value, and then continuing with theoptimization.A further issue in ﬁnding maximum likelihood solutions arises from the factthat for any given maximum likelihood solution, a K-component mixture will havea total of K! equivalent solutions corresponding to the K! ways of assigning Ksets of parameters to K components.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.