Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 92

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 92 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 922020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 92)

A graph is said to be a D map(for ‘dependency map’) of a distribution if every conditional independence statementsatisﬁed by the distribution is reﬂected in the graph. Thus a completely disconnectedgraph (no links) will be a trivial D map for any distribution.Alternatively, we can consider a speciﬁc distribution and ask which graphs havethe appropriate conditional independence properties. If every conditional independence statement implied by a graph is satisﬁed by a speciﬁc distribution, then thegraph is said to be an I map (for ‘independence map’) of that distribution.

Clearly afully connected graph will be a trivial I map for any distribution.If it is the case that every conditional independence property of the distributionis reﬂected in the graph, and vice versa, then the graph is said to be a perfect map forVenn diagram illustrating the set of all distributionsP over a given set of variables, together with theset of distributions D that can be represented as aperfect map using a directed graph, and the set Uthat can be represented as a perfect map using anundirected graph.DUP3938.4.

Inference in Graphical ModelsFigure 8.35A directed graph whose conditional independenceproperties cannot be expressed using an undirectedgraph over the same three variables.ABCthat distribution. A perfect map is therefore both an I map and a D map.Consider the set of distributions such that for each distribution there exists adirected graph that is a perfect map.

This set is distinct from the set of distributionssuch that for each distribution there exists an undirected graph that is a perfect map.In addition there are distributions for which neither directed nor undirected graphsoffer a perfect map. This is illustrated as a Venn diagram in Figure 8.34.Figure 8.35 shows an example of a directed graph that is a perfect map fora distribution satisfying the conditional independence properties A ⊥⊥ B | ∅ andA ⊥⊥ B | C.

There is no corresponding undirected graph over the same three variables that is a perfect map.Conversely, consider the undirected graph over four variables shown in Figure 8.36. This graph exhibits the properties A ⊥⊥ B | ∅, C ⊥⊥ D | A ∪ B andA⊥⊥ B | C ∪ D. There is no directed graph over four variables that implies the sameset of conditional independence properties.The graphical framework can be extended in a consistent way to graphs thatinclude both directed and undirected links. These are called chain graphs (Lauritzenand Wermuth, 1989; Frydenberg, 1990), and contain the directed and undirectedgraphs considered so far as special cases.

Although such graphs can represent abroader class of distributions than either directed or undirected alone, there remaindistributions for which even a chain graph cannot provide a perfect map. Chaingraphs are not discussed further in this book.Figure 8.36CAn undirected graph whose conditional independenceproperties cannot be expressed in terms of a directedgraph over the same variables.ABD8.4. Inference in Graphical ModelsWe turn now to the problem of inference in graphical models, in which some ofthe nodes in a graph are clamped to observed values, and we wish to compute theposterior distributions of one or more subsets of other nodes. As we shall see, wecan exploit the graphical structure both to ﬁnd efﬁcient algorithms for inference, and3948.

GRAPHICAL MODELSFigure 8.37A graphical representation of Bayes’ theorem. xSee the text for details.xxyyy(a)(b)(c)to make the structure of those algorithms transparent. Speciﬁcally, we shall see thatmany algorithms can be expressed in terms of the propagation of local messagesaround the graph. In this section, we shall focus primarily on techniques for exactinference, and in Chapter 10 we shall consider a number of approximate inferencealgorithms.To start with, let us consider the graphical interpretation of Bayes’ theorem.Suppose we decompose the joint distribution p(x, y) over two variables x and y intoa product of factors in the form p(x, y) = p(x)p(y|x).

This can be represented bythe directed graph shown in Figure 8.37(a). Now suppose we observe the value ofy, as indicated by the shaded node in Figure 8.37(b). We can view the marginaldistribution p(x) as a prior over the latent variable x, and our goal is to infer thecorresponding posterior distribution over x. Using the sum and product rules ofprobability we can evaluatep(y|x )p(x )(8.47)p(y) =xwhich can then be used in Bayes’ theorem to calculatep(x|y) =p(y|x)p(x).p(y)(8.48)Thus the joint distribution is now expressed in terms of p(y) and p(x|y). From agraphical perspective, the joint distribution p(x, y) is now represented by the graphshown in Figure 8.37(c), in which the direction of the arrow is reversed. This is thesimplest example of an inference problem for a graphical model.8.4.1 Inference on a chainNow consider a more complex problem involving the chain of nodes of the formshown in Figure 8.32.

This example will lay the foundation for a discussion of exactinference in more general graphs later in this section.Speciﬁcally, we shall consider the undirected graph in Figure 8.32(b). We havealready seen that the directed chain can be transformed into an equivalent undirectedchain.

Because the directed graph does not have any nodes with more than oneparent, this does not require the addition of any extra links, and the directed andundirected versions of this graph express exactly the same set of conditional independence statements.8.4. Inference in Graphical Models395The joint distribution for this graph takes the formp(x) =1ψ1,2 (x1 , x2 )ψ2,3 (x2 , x3 ) · · · ψN −1,N (xN −1 , xN ).Z(8.49)We shall consider the speciﬁc case in which the N nodes represent discrete variables each having K states, in which case each potential function ψn−1,n (xn−1 , xn )comprises an K × K table, and so the joint distribution has (N − 1)K 2 parameters.Let us consider the inference problem of ﬁnding the marginal distribution p(xn )for a speciﬁc node xn that is part way along the chain.

Note that, for the moment,there are no observed nodes. By deﬁnition, the required marginal is obtained bysumming the joint distribution over all variables except xn , so that ······p(x).(8.50)p(xn ) =x1xn−1 xn+1xNIn a naive implementation, we would ﬁrst evaluate the joint distribution andthen perform the summations explicitly.

The joint distribution can be represented asa set of numbers, one for each possible value for x. Because there are N variableseach with K states, there are K N values for x and so evaluation and storage of thejoint distribution, as well as marginalization to obtain p(xn ), all involve storage andcomputation that scale exponentially with the length N of the chain.We can, however, obtain a much more efﬁcient algorithm by exploiting the conditional independence properties of the graphical model. If we substitute the factorized expression (8.49) for the joint distribution into (8.50), then we can rearrange theorder of the summations and the multiplications to allow the required marginal to beevaluated much more efﬁciently. Consider for instance the summation over xN .

Thepotential ψN −1,N (xN −1 , xN ) is the only one that depends on xN , and so we canperform the summationψN −1,N (xN −1 , xN )(8.51)xNﬁrst to give a function of xN −1 . We can then use this to perform the summationover xN −1 , which will involve only this new function together with the potentialψN −2,N −1 (xN −2 , xN −1 ), because this is the only other place that xN −1 appears.Similarly, the summation over x1 involves only the potential ψ1,2 (x1 , x2 ) and socan be performed separately to give a function of x2 , and so on. Because eachsummation effectively removes a variable from the distribution, this can be viewedas the removal of a node from the graph.If we group the potentials and summations together in this way, we can express3968. GRAPHICAL MODELSthe desired marginal in the form1p(xn ) =Z⎡⎣ψn−1,n (xn−1 , xn ) · · ·ψ2,3 (x2 , x3 )ψ1,2 (x1 , x2 )(xn−1⎡⎣(xn+1x2ψn,n+1 (xn , xn+1 ) · · ·)*µα (xn )xNx1···⎦+⎤ψN −1,N (xN −1 , xN ) · · · ⎦ .)*µβ (xn )⎤(8.52)+The reader is encouraged to study this re-ordering carefully as the underlying ideaforms the basis for the later discussion of the general sum-product algorithm.

Herethe key concept that we are exploiting is that multiplication is distributive over addition, so thatab + ac = a(b + c)(8.53)in which the left-hand side involves three arithmetic operations whereas the righthand side reduces this to two operations.Let us work out the computational cost of evaluating the required marginal usingthis re-ordered expression. We have to perform N − 1 summations each of which isover K states and each of which involves a function of two variables. For instance,the summation over x1 involves only the function ψ1,2 (x1 , x2 ), which is a table ofK × K numbers.

We have to sum this table over x1 for each value of x2 and so thishas O(K 2 ) cost. The resulting vector of K numbers is multiplied by the matrix ofnumbers ψ2,3 (x2 , x3 ) and so is again O(K 2 ). Because there are N − 1 summationsand multiplications of this kind, the total cost of evaluating the marginal p(xn ) isO(N K 2 ). This is linear in the length of the chain, in contrast to the exponential costof a naive approach. We have therefore been able to exploit the many conditionalindependence properties of this simple graph in order to obtain an efﬁcient calculation.

If the graph had been fully connected, there would have been no conditionalindependence properties, and we would have been forced to work directly with thefull joint distribution.We now give a powerful interpretation of this calculation in terms of the passingof local messages around on the graph. From (8.52) we see that the expression for themarginal p(xn ) decomposes into the product of two factors times the normalizationconstant1p(xn ) = µα (xn )µβ (xn ).(8.54)ZWe shall interpret µα (xn ) as a message passed forwards along the chain from nodexn−1 to node xn .

Similarly, µβ (xn ) can be viewed as a message passed backwards8.4. Inference in Graphical ModelsFigure 8.38 The marginal distributionp(xn ) for a node xn along the chain is obtained by multiplying the two messagesµα (xn ) and µβ (xn ), and then normalizing. These messages can themselvesbe evaluated recursively by passing messages from both ends of the chain towards node xn .µα (xn−1 )x1µβ (xn )µα (xn )xn−1xn397µβ (xn+1 )xn+1xNalong the chain to node xn from node xn+1 . Note that each of the messages comprises a set of K values, one for each choice of xn , and so the product of two messages should be interpreted as the point-wise multiplication of the elements of thetwo messages to give another set of K values.The message µα (xn ) can be evaluated recursively because⎡⎤µα (xn ) =ψn−1,n (xn−1 , xn ) ⎣···⎦xn−1=xn−2ψn−1,n (xn−1 , xn )µα (xn−1 ).(8.55)xn−1We therefore ﬁrst evaluateµα (x2 ) =ψ1,2 (x1 , x2 )(8.56)x1and then apply (8.55) repeatedly until we reach the desired node.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.