Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 90

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 90 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 902020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 90)

Thisfollows from the fact that there is no direct path between the two nodes, and all otherpaths pass through nodes that are observed, and hence those paths are blocked. Thisconditional independence property can be expressed asp(xi , xj |x\{i,j} ) = p(xi |x\{i,j} )p(xj |x\{i,j} )(8.38)where x\{i,j} denotes the set x of all variables with xi and xj removed. The factorization of the joint distribution must therefore be such that xi and xj do not appearin the same factor in order for the conditional independence property to hold for allpossible distributions belonging to the graph.This leads us to consider a graphical concept called a clique, which is deﬁnedas a subset of the nodes in a graph such that there exists a link between all pairs ofnodes in the subset. In other words, the set of nodes in a clique is fully connected.Furthermore, a maximal clique is a clique such that it is not possible to include anyother nodes from the graph in the set without it ceasing to be a clique.

These conceptsare illustrated by the undirected graph over four variables shown in Figure 8.29. Thisgraph has ﬁve cliques of two nodes given by {x1 , x2 }, {x2 , x3 }, {x3 , x4 }, {x4 , x2 },and {x1 , x3 }, as well as two maximal cliques given by {x1 , x2 , x3 } and {x2 , x3 , x4 }.The set {x1 , x2 , x3 , x4 } is not a clique because of the missing link from x1 to x4 .We can therefore deﬁne the factors in the decomposition of the joint distributionto be functions of the variables in the cliques. In fact, we can consider functionsof the maximal cliques, without loss of generality, because other cliques must besubsets of maximal cliques. Thus, if {x1 , x2 , x3 } is a maximal clique and we deﬁnean arbitrary function over this clique, then including another factor deﬁned over asubset of these variables would be redundant.Let us denote a clique by C and the set of variables in that clique by xC .

ThenFigure 8.29A four-node undirected graph showing a clique (outlined ingreen) and a maximal clique (outlined in blue).x1x2x3x43868. GRAPHICAL MODELSthe joint distribution is written as a product of potential functions ψC (xC ) over themaximal cliques of the graph1 p(x) =ψC (xC ).(8.39)ZCHere the quantity Z, sometimes called the partition function, is a normalization constant and is given byψC (xC )(8.40)Z=xCwhich ensures that the distribution p(x) given by (8.39) is correctly normalized.By considering only potential functions which satisfy ψC (xC ) 0 we ensure thatp(x) 0.

In (8.40) we have assumed that x comprises discrete variables, but theframework is equally applicable to continuous variables, or a combination of the two,in which the summation is replaced by the appropriate combination of summationand integration.Note that we do not restrict the choice of potential functions to those that have aspeciﬁc probabilistic interpretation as marginal or conditional distributions. This isin contrast to directed graphs in which each factor represents the conditional distribution of the corresponding variable, conditioned on the state of its parents.

However,in special cases, for instance where the undirected graph is constructed by startingwith a directed graph, the potential functions may indeed have such an interpretation,as we shall see shortly.One consequence of the generality of the potential functions ψC (xC ) is thattheir product will in general not be correctly normalized.

We therefore have to introduce an explicit normalization factor given by (8.40). Recall that for directedgraphs, the joint distribution was automatically normalized as a consequence of thenormalization of each of the conditional distributions in the factorization.The presence of this normalization constant is one of the major limitations ofundirected graphs. If we have a model with M discrete nodes each having K states,then the evaluation of the normalization term involves summing over K M states andso (in the worst case) is exponential in the size of the model.

The partition functionis needed for parameter learning because it will be a function of any parameters thatgovern the potential functions ψC (xC ). However, for evaluation of local conditionaldistributions, the partition function is not needed because a conditional is the ratioof two marginals, and the partition function cancels between numerator and denominator when evaluating this ratio. Similarly, for evaluating local marginal probabilities we can work with the unnormalized joint distribution and then normalize themarginals explicitly at the end.

Provided the marginals only involves a small numberof variables, the evaluation of their normalization coefﬁcient will be feasible.So far, we have discussed the notion of conditional independence based on simple graph separation and we have proposed a factorization of the joint distributionthat is intended to correspond to this conditional independence structure. However,we have not made any formal connection between conditional independence andfactorization for undirected graphs.

To do so we need to restrict attention to potential functions ψC (xC ) that are strictly positive (i.e., never zero or negative for any8.3. Markov Random Fields387choice of xC ). Given this restriction, we can make a precise relationship betweenfactorization and conditional independence.To do this we again return to the concept of a graphical model as a ﬁlter, corresponding to Figure 8.25. Consider the set of all possible distributions deﬁned overa ﬁxed set of variables corresponding to the nodes of a particular undirected graph.We can deﬁne UI to be the set of such distributions that are consistent with the setof conditional independence statements that can be read from the graph using graphseparation.

Similarly, we can deﬁne UF to be the set of such distributions that canbe expressed as a factorization of the form (8.39) with respect to the maximal cliquesof the graph. The Hammersley-Clifford theorem (Clifford, 1990) states that the setsUI and UF are identical.Because we are restricted to potential functions which are strictly positive it isconvenient to express them as exponentials, so thatψC (xC ) = exp {−E(xC )}(8.41)where E(xC ) is called an energy function, and the exponential representation iscalled the Boltzmann distribution. The joint distribution is deﬁned as the product ofpotentials, and so the total energy is obtained by adding the energies of each of themaximal cliques.In contrast to the factors in the joint distribution for a directed graph, the potentials in an undirected graph do not have a speciﬁc probabilistic interpretation.Although this gives greater ﬂexibility in choosing the potential functions, becausethere is no normalization constraint, it does raise the question of how to motivate achoice of potential function for a particular application.

This can be done by viewing the potential function as expressing which conﬁgurations of the local variablesare preferred to others. Global conﬁgurations that have a relatively high probabilityare those that ﬁnd a good balance in satisfying the (possibly conﬂicting) inﬂuencesof the clique potentials. We turn now to a speciﬁc example to illustrate the use ofundirected graphs.8.3.3 Illustration: Image de-noisingWe can illustrate the application of undirected graphs using an example of noiseremoval from a binary image (Besag, 1974; Geman and Geman, 1984; Besag, 1986).Although a very simple example, this is typical of more sophisticated applications.Let the observed noisy image be described by an array of binary pixel values yi ∈{−1, +1}, where the index i = 1, .

. . , D runs over all pixels. We shall supposethat the image is obtained by taking an unknown noise-free image, described bybinary pixel values xi ∈ {−1, +1} and randomly ﬂipping the sign of pixels withsome small probability. An example binary image, together with a noise corruptedimage obtained by ﬂipping the sign of the pixels with probability 10%, is shown inFigure 8.30. Given the noisy image, our goal is to recover the original noise-freeimage.Because the noise level is small, we know that there will be a strong correlationbetween xi and yi .

We also know that neighbouring pixels xi and xj in an imageare strongly correlated. This prior knowledge can be captured using the Markov3888. GRAPHICAL MODELSFigure 8.30 Illustration of image de-noising using a Markov random ﬁeld. The top row shows the originalbinary image on the left and the corrupted image after randomly changing 10% of the pixels on the right. Thebottom row shows the restored images obtained using iterated conditional models (ICM) on the left and usingthe graph-cut algorithm on the right.

ICM produces an image where 96% of the pixels agree with the originalimage, whereas the corresponding number for graph-cut is 99%.random ﬁeld model whose undirected graph is shown in Figure 8.31. This graph hastwo types of cliques, each of which contains two variables. The cliques of the form{xi , yi } have an associated energy function that expresses the correlation betweenthese variables. We choose a very simple energy function for these cliques of theform −ηxi yi where η is a positive constant. This has the desired effect of giving alower energy (thus encouraging a higher probability) when xi and yi have the samesign and a higher energy when they have the opposite sign.The remaining cliques comprise pairs of variables {xi , xj } where i and j areindices of neighbouring pixels. Again, we want the energy to be lower when thepixels have the same sign than when they have the opposite sign, and so we choosean energy given by −βxi xj where β is a positive constant.Because a potential function is an arbitrary, nonnegative function over a maximalclique, we can multiply it by any nonnegative functions of subsets of the clique, or8.3.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.