Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 99

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 99 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 992020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 99)

Calculate the number ofdistinct directed trees that can be constructed from a given undirected tree.8.19 ( ) Apply the sum-product algorithm derived in Section 8.4.4 to the chain-ofnodes model discussed in Section 8.4.1 and show that the results (8.54), (8.55), and(8.57) are recovered as a special case.8.20 () www Consider the message passing protocol for the sum-product algorithm ona tree-structured factor graph in which messages are ﬁrst propagated from the leavesto an arbitrarily chosen root node and then from the root node out to the leaves.

Useproof by induction to show that the messages can be passed in such an order thatat every step, each node that must send a message has received all of the incomingmessages necessary to construct its outgoing messages.8.21 ( ) www Show that the marginal distributions p(xs ) over the sets of variablesxs associated with each of the factors fx (xs ) in a factor graph can be found by ﬁrstrunning the sum-product message passing algorithm and then evaluating the requiredmarginals using (8.72).8.22 () Consider a tree-structured factor graph, in which a given subset of the variablenodes form a connected subgraph (i.e., any variable node of the subset is connectedto at least one of the other variable nodes via a single factor node). Show how thesum-product algorithm can be used to compute the marginal distribution over thatsubset.8.23 ( ) www In Section 8.4.4, we showed that the marginal distribution p(xi ) for avariable node xi in a factor graph is given by the product of the messages arriving atthis node from neighbouring factor nodes in the form (8.63).

Show that the marginalp(xi ) can also be written as the product of the incoming message along any one ofthe links with the outgoing message along the same link.8.24 ( ) Show that the marginal distribution for the variables xs in a factor fs (xs ) ina tree-structured factor graph, after running the sum-product message passing algorithm, can be written as the product of the message arriving at the factor node alongall its links, times the local factor f (xs ), in the form (8.72).4228. GRAPHICAL MODELS8.25 ( ) In (8.86), we veriﬁed that the sum-product algorithm run on the graph inFigure 8.51 with node x3 designated as the root node gives the correct marginal forx2 .

Show that the correct marginals are obtained also for x1 and x3 . Similarly, showthat the use of the result (8.72) after running the sum-product algorithm on this graphgives the correct joint distribution for x1 , x2 .8.26 () Consider a tree-structured factor graph over discrete variables, and suppose wewish to evaluate the joint distribution p(xa , xb ) associated with two variables xa andxb that do not belong to a common factor. Deﬁne a procedure for using the sumproduct algorithm to evaluate this joint distribution in which one of the variables issuccessively clamped to each of its allowed values.8.27 ( ) Consider two discrete variables x and y each having three possible states, forexample x, y ∈ {0, 1, 2}.

Construct a joint distribution p(x, y) over these variablesx that maximizes the marginal p(x), along withhaving the property that the value the value y that maximizes the marginal p(y), together have probability zero underthe joint distribution, so that p(x, y ) = 0.8.28 ( ) www The concept of a pending message in the sum-product algorithm fora factor graph was deﬁned in Section 8.4.7. Show that if the graph has one or morecycles, there will always be at least one pending message irrespective of how longthe algorithm runs.8.29 ( ) www Show that if the sum-product algorithm is run on a factor graph with atree structure (no loops), then after a ﬁnite number of messages have been sent, therewill be no pending messages.9Mixture Modelsand EMSection 9.1If we deﬁne a joint distribution over observed and latent variables, the corresponding distribution of the observed variables alone is obtained by marginalization.

Thisallows relatively complex marginal distributions over observed variables to be expressed in terms of more tractable joint distributions over the expanded space ofobserved and latent variables. The introduction of latent variables thereby allowscomplicated distributions to be formed from simpler components. In this chapter,we shall see that mixture distributions, such as the Gaussian mixture discussed inSection 2.3.9, can be interpreted in terms of discrete latent variables. Continuouslatent variables will form the subject of Chapter 12.As well as providing a framework for building more complex probability distributions, mixture models can also be used to cluster data.

We therefore begin ourdiscussion of mixture distributions by considering the problem of ﬁnding clustersin a set of data points, which we approach ﬁrst using a nonprobabilistic techniquecalled the K-means algorithm (Lloyd, 1982). Then we introduce the latent variable423424Section 9.2Section 9.3Section 9.49. MIXTURE MODELS AND EMview of mixture distributions in which the discrete latent variables can be interpretedas deﬁning assignments of data points to speciﬁc components of the mixture. A general technique for ﬁnding maximum likelihood estimators in latent variable modelsis the expectation-maximization (EM) algorithm.

We ﬁrst of all use the Gaussianmixture distribution to motivate the EM algorithm in a fairly informal way, and thenwe give a more careful treatment based on the latent variable viewpoint. We shallsee that the K-means algorithm corresponds to a particular nonprobabilistic limit ofEM applied to mixtures of Gaussians. Finally, we discuss EM in some generality.Gaussian mixture models are widely used in data mining, pattern recognition,machine learning, and statistical analysis.

In many applications, their parameters aredetermined by maximum likelihood, typically using the EM algorithm. However, aswe shall see there are some signiﬁcant limitations to the maximum likelihood approach, and in Chapter 10 we shall show that an elegant Bayesian treatment can begiven using the framework of variational inference. This requires little additionalcomputation compared with EM, and it resolves the principal difﬁculties of maximum likelihood while also allowing the number of components in the mixture to beinferred automatically from the data.9.1. K-means ClusteringWe begin by considering the problem of identifying groups, or clusters, of data pointsin a multidimensional space. Suppose we have a data set {x1 , .

. . , xN } consistingof N observations of a random D-dimensional Euclidean variable x. Our goal is topartition the data set into some number K of clusters, where we shall suppose forthe moment that the value of K is given. Intuitively, we might think of a cluster ascomprising a group of data points whose inter-point distances are small comparedwith the distances to points outside of the cluster. We can formalize this notion byﬁrst introducing a set of D-dimensional vectors µk , where k = 1, . .

. , K, in whichµk is a prototype associated with the k th cluster. As we shall see shortly, we canthink of the µk as representing the centres of the clusters. Our goal is then to ﬁndan assignment of data points to clusters, as well as a set of vectors {µk }, such thatthe sum of the squares of the distances of each data point to its closest vector µk , isa minimum.It is convenient at this point to deﬁne some notation to describe the assignmentof data points to clusters. For each data point xn , we introduce a corresponding setof binary indicator variables rnk ∈ {0, 1}, where k = 1, .

. . , K describing which ofthe K clusters the data point xn is assigned to, so that if data point xn is assigned tocluster k then rnk = 1, and rnj = 0 for j = k. This is known as the 1-of-K codingscheme. We can then deﬁne an objective function, sometimes called a distortionmeasure, given byN Krnk xn − µk 2(9.1)J=n=1 k=1which represents the sum of the squares of the distances of each data point to its9.1. K-means ClusteringSection 9.4425assigned vector µk . Our goal is to ﬁnd values for the {rnk } and the {µk } so as tominimize J.

We can do this through an iterative procedure in which each iterationinvolves two successive steps corresponding to successive optimizations with respectto the rnk and the µk . First we choose some initial values for the µk . Then in the ﬁrstphase we minimize J with respect to the rnk , keeping the µk ﬁxed.

In the secondphase we minimize J with respect to the µk , keeping rnk ﬁxed. This two-stageoptimization is then repeated until convergence. We shall see that these two stagesof updating rnk and updating µk correspond respectively to the E (expectation) andM (maximization) steps of the EM algorithm, and to emphasize this we shall use theterms E step and M step in the context of the K-means algorithm.Consider ﬁrst the determination of the rnk . Because J in (9.1) is a linear function of rnk , this optimization can be performed easily to give a closed form solution.The terms involving different n are independent and so we can optimize for eachn separately by choosing rnk to be 1 for whichever value of k gives the minimumvalue of xn − µk 2 .

In other words, we simply assign the nth data point to theclosest cluster centre. More formally, this can be expressed as1 if k = arg minj xn − µj 2(9.2)rnk =0 otherwise.Now consider the optimization of the µk with the rnk held ﬁxed. The objectivefunction J is a quadratic function of µk , and it can be minimized by setting itsderivative with respect to µk to zero giving2Nrnk (xn − µk ) = 0(9.3)n=1which we can easily solve for µk to given rnk xn.µk = n rnkExercise 9.1Appendix A(9.4)The denominator in this expression is equal to the number of points assigned tocluster k, and so this result has a simple interpretation, namely set µk equal to themean of all of the data points xn assigned to cluster k. For this reason, the procedureis known as the K-means algorithm.The two phases of re-assigning data points to clusters and re-computing the cluster means are repeated in turn until there is no further change in the assignments (oruntil some maximum number of iterations is exceeded).

Because each phase reducesthe value of the objective function J, convergence of the algorithm is assured. However, it may converge to a local rather than global minimum of J. The convergenceproperties of the K-means algorithm were studied by MacQueen (1967).The K-means algorithm is illustrated using the Old Faithful data set in Figure 9.1. For the purposes of this example, we have made a linear re-scaling of thedata, known as standardizing, such that each of the variables has zero mean andunit standard deviation. For this example, we have chosen K = 2, and so in this42629. MIXTURE MODELS AND EM(a)2(b)2000−2−2−2−2202(d)−2202(e)−22000−2−2−2−2202(g)−2202(h)200−2−2−202−202020202(f)−20−2(c)(i)−2Figure 9.1 Illustration of the K-means algorithm using the re-scaled Old Faithful data set. (a) Green pointsdenote the data set in a two-dimensional Euclidean space.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.