Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 96

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 96 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 962020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 96)

However, there is nothing speciﬁc to discrete variables either in the graphicalframework or in the probabilistic construction of the sum-product algorithm. For8.4. Inference in Graphical ModelsTable 8.1Section 13.3Example of a joint distribution over two binary variables forwhich the maximum of the joint distribution occurs for different variable values compared to the maxima of the twomarginals.y=0y=1x=00.30.3411x=10.40.0continuous variables the summations are simply replaced by integrations.

We shallgive an example of the sum-product algorithm applied to a graph of linear-Gaussianvariables when we consider linear dynamical systems.8.4.5 The max-sum algorithmThe sum-product algorithm allows us to take a joint distribution p(x) expressedas a factor graph and efﬁciently ﬁnd marginals over the component variables. Twoother common tasks are to ﬁnd a setting of the variables that has the largest probability and to ﬁnd the value of that probability. These can be addressed through aclosely related algorithm called max-sum, which can be viewed as an application ofdynamic programming in the context of graphical models (Cormen et al., 2001).A simple approach to ﬁnding latent variable values having high probabilitywould be to run the sum-product algorithm to obtain the marginals p(xi ) for every variable, and then, for each marginal in turn, to ﬁnd the value xi that maximizesthat marginal.

However, this would give the set of values that are individually themost probable. In practice, we typically wish to ﬁnd the set of values that jointlyhave the largest probability, in other words the vector xmax that maximizes the jointdistribution, so that(8.87)xmax = arg max p(x)xfor which the corresponding value of the joint probability will be given byp(xmax ) = max p(x).xExercise 8.27(8.88)In general, xmax is not the same as the set of xi values, as we can easily show usinga simple example. Consider the joint distribution p(x, y) over two binary variablesx, y ∈ {0, 1} given in Table 8.1.

The joint distribution is maximized by setting x =1 and y = 0, corresponding the value 0.4. However, the marginal for p(x), obtainedby summing over both values of y, is given by p(x = 0) = 0.6 and p(x = 1) = 0.4,and similarly the marginal for y is given by p(y = 0) = 0.7 and p(y = 1) = 0.3,and so the marginals are maximized by x = 0 and y = 0, which corresponds to avalue of 0.3 for the joint distribution. In fact, it is not difﬁcult to construct examplesfor which the set of individually most probable values has probability zero under thejoint distribution.We therefore seek an efﬁcient algorithm for ﬁnding the value of x that maximizes the joint distribution p(x) and that will allow us to obtain the value of thejoint distribution at its maximum.

To address the second of these problems, we shallsimply write out the max operator in terms of its componentsmax p(x) = max . . . max p(x)xx1xM(8.89)4128. GRAPHICAL MODELSwhere M is the total number of variables, and then substitute for p(x) using itsexpansion in terms of a product of factors. In deriving the sum-product algorithm,we made use of the distributive law (8.53) for multiplication. Here we make use ofthe analogous law for the max operatormax(ab, ac) = a max(b, c)(8.90)which holds if a 0 (as will always be the case for the factors in a graphical model).This allows us to exchange products with maximizations.Consider ﬁrst the simple example of a chain of nodes described by (8.49).

Theevaluation of the probability maximum can be written as1max · · · max [ψ1,2 (x1 , x2 ) · · · ψN −1,N (xN −1 , xN )]xNZ x11max ψ1,2 (x1 , x2 ) · · · max ψN −1,N (xN −1 , xN ) .xNZ x1max p(x) =x=As with the calculation of marginals, we see that exchanging the max and productoperators results in a much more efﬁcient computation, and one that is easily interpreted in terms of messages passed from node xN backwards along the chain to nodex1 .We can readily generalize this result to arbitrary tree-structured factor graphsby substituting the expression (8.59) for the factor graph expansion into (8.89) andagain exchanging maximizations with products. The structure of this calculation isidentical to that of the sum-product algorithm, and so we can simply translate thoseresults into the present context. In particular, suppose that we designate a particularvariable node as the ‘root’ of the graph.

Then we start a set of messages propagatinginwards from the leaves of the tree towards the root, with each node sending itsmessage towards the root once it has received all incoming messages from its otherneighbours. The ﬁnal maximization is performed over the product of all messagesarriving at the root node, and gives the maximum value for p(x). This could be calledthe max-product algorithm and is identical to the sum-product algorithm except thatsummations are replaced by maximizations.

Note that at this stage, messages havebeen sent from leaves to the root, but not in the other direction.In practice, products of many small probabilities can lead to numerical underﬂow problems, and so it is convenient to work with the logarithm of the joint distribution. The logarithm is a monotonic function, so that if a > b then ln a > ln b, andhence the max operator and the logarithm function can be interchanged, so that(8.91)ln max p(x) = max ln p(x).xxThe distributive property is preserved becausemax(a + b, a + c) = a + max(b, c).(8.92)Thus taking the logarithm simply has the effect of replacing the products in themax-product algorithm with sums, and so we obtain the max-sum algorithm.

From8.4. Inference in Graphical Models413the results (8.66) and (8.69) derived earlier for the sum-product algorithm, we canreadily write down the max-sum algorithm in terms of message passing simply byreplacing ‘sum’ with ‘max’ and replacing products with sums of logarithms to give⎡⎤µf →x (x) =max ⎣ln f (x, x1 , . . . , xM ) +µxm →f (xm )⎦ (8.93)x1 ,...,xMµx→f (x) =m∈ne(fs )\xµfl →x (x).(8.94)l∈ne(x)\fThe initial messages sent by the leaf nodes are obtained by analogy with (8.70) and(8.71) and are given byµx→f (x) = 0µf →x (x) = ln f (x)(8.95)(8.96)while at the root node the maximum probability can then be computed, by analogywith (8.63), using⎡⎤µfs →x (x)⎦ .(8.97)pmax = max ⎣xs∈ne(x)So far, we have seen how to ﬁnd the maximum of the joint distribution by propagating messages from the leaves to an arbitrarily chosen root node.

The result willbe the same irrespective of which node is chosen as the root. Now we turn to thesecond problem of ﬁnding the conﬁguration of the variables for which the joint distribution attains this maximum value. So far, we have sent messages from the leavesto the root. The process of evaluating (8.97) will also give the value xmax for themost probable value of the root node variable, deﬁned by⎡⎤µfs →x (x)⎦ .(8.98)xmax = arg max ⎣xs∈ne(x)At this point, we might be tempted simply to continue with the message passing algorithm and send messages from the root back out to the leaves, using (8.93) and(8.94), then apply (8.98) to all of the remaining variable nodes. However, becausewe are now maximizing rather than summing, it is possible that there may be multiple conﬁgurations of x all of which give rise to the maximum value for p(x).

Insuch cases, this strategy can fail because it is possible for the individual variablevalues obtained by maximizing the product of messages at each node to belong todifferent maximizing conﬁgurations, giving an overall conﬁguration that no longercorresponds to a maximum.The problem can be resolved by adopting a rather different kind of messagepassing from the root node to the leaves. To see how this works, let us return onceagain to the simple chain example of N variables x1 , . . .

, xN each having K states,4148. GRAPHICAL MODELSFigure 8.53 A lattice, or trellis, diagram showing explicitly the K possible states (one per rowof the diagram) for each of the variables xn in the k = 1chain model. In this illustration K = 3. The arrow shows the direction of message passing in themax-product algorithm.

For every state k of eachvariable xn (corresponding to column n of the diagram) the function φ(xn ) deﬁnes a unique state atthe previous variable, indicated by the black lines. k = 2The two paths through the lattice correspond toconﬁgurations that give the global maximum of thejoint probability distribution, and either of thesecan be found by tracing back along the black linesin the opposite direction to the arrow.k=3n−2n−1nn+1corresponding to the graph shown in Figure 8.38. Suppose we take node xN to bethe root node. Then in the ﬁrst phase, we propagate messages from the leaf node x1to the root node usingµxn →fn,n+1 (xn ) = µfn−1,n →xn (xn )µfn−1,n →xn (xn ) = max ln fn−1,n (xn−1 , xn ) + µxn−1 →f n−1,n (xn )xn−1which follow from applying (8.94) and (8.93) to this particular graph.

The initialmessage sent from the leaf node is simplyµx1 →f1,2 (x1 ) = 0.The most probable value for xN is then given by= arg max µfN −1,N →xN (xN ) .xmaxNxN(8.99)(8.100)Now we need to determine the states of the previous variables that correspond to thesame maximizing conﬁguration. This can be done by keeping track of which valuesof the variables gave rise to the maximum state of each variable, in other words bystoring quantities given byφ(xn ) = arg max ln fn−1,n (xn−1 , xn ) + µxn−1 →f n−1,n (xn ) .(8.101)xn−1To understand better what is happening, it is helpful to represent the chain of variables in terms of a lattice or trellis diagram as shown in Figure 8.53. Note that thisis not a probabilistic graphical model because the nodes represent individual statesof variables, while each variable corresponds to a column of such states in the diagram.

For each state of a given variable, there is a unique state of the previousvariable that maximizes the probability (ties are broken either systematically or atrandom), corresponding to the function φ(xn ) given by (8.101), and this is indicated8.4. Inference in Graphical Models415by the lines connecting the nodes. Once we know the most probable value of the ﬁnal node xN , we can then simply follow the link back to ﬁnd the most probable stateof node xN −1 and so on back to the initial node x1 .

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.