Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 89
Текст из файла (страница 89)
If we present tothe filter the set of all possible distributions p(x) over the set of variables x, then thesubset of distributions that are passed by the filter will be denoted DF, for directedfactorization. This is illustrated in Figure 8.25. Alternatively, we can use the graph asa different kind of filter by first listing all of the conditional independence propertiesobtained by applying the d-separation criterion to the graph, and then allowing adistribution to pass only if it satisfies all of these properties. If we present all possibledistributions p(x) to this second kind of filter, then the d-separation theorem tells usthat the set of distributions that will be allowed through is precisely the set DF .It should be emphasized that the conditional independence properties obtainedfrom d-separation apply to any probabilistic model described by that particular directed graph.
This will be true, for instance, whether the variables are discrete orcontinuous or a combination of these. Again, we see that a particular graph is describing a whole family of probability distributions.At one extreme we have a fully connected graph that exhibits no conditional independence properties at all, and which can represent any possible joint probabilitydistribution over the given variables.
The set DF will contain all possible distribu-3828. GRAPHICAL MODELSDFp(x)Figure 8.25We can view a graphical model (in this case a directed graph) as a filter in which a probability distribution p(x) is allowed through the filter if, and only if, it satisfies the directedfactorization property (8.5). The set of all possible probability distributions p(x) that passthrough the filter is denoted DF. We can alternatively use the graph to filter distributionsaccording to whether they respect all of the conditional independencies implied by thed-separation properties of the graph. The d-separation theorem says that it is the sameset of distributions DF that will be allowed through this second kind of filter.tions p(x).
At the other extreme, we have the fully disconnected graph, i.e., onehaving no links at all. This corresponds to joint distributions which factorize into theproduct of the marginal distributions over the variables comprising the nodes of thegraph.Note that for any given graph, the set of distributions DF will include any distributions that have additional independence properties beyond those described bythe graph. For instance, a fully factorized distribution will always be passed throughthe filter implied by any graph over the corresponding set of variables.We end our discussion of conditional independence properties by exploring theconcept of a Markov blanket or Markov boundary. Consider a joint distributionp(x1 , . . .
, xD ) represented by a directed graph having D nodes, and consider theconditional distribution of a particular node with variables xi conditioned on all ofthe remaining variables xj=i . Using the factorization property (8.5), we can expressthis conditional distribution in the formp(xi |x{j=i} ) ==p(x1 , . . . , xD )p(x1 , .
. . , xD ) dxip(xk |pak )k p(xk |pak ) dxikin which the integral is replaced by a summation in the case of discrete variables. Wenow observe that any factor p(xk |pak ) that does not have any functional dependenceon xi can be taken outside the integral over xi , and will therefore cancel betweennumerator and denominator. The only factors that remain will be the conditionaldistribution p(xi |pai ) for node xi itself, together with the conditional distributionsfor any nodes xk such that node xi is in the conditioning set of p(xk |pak ), in otherwords for which xi is a parent of xk . The conditional p(xi |pai ) will depend on theparents of node xi , whereas the conditionals p(xk |pak ) will depend on the children8.3.
Markov Random FieldsFigure 8.26The Markov blanket of a node xi comprises the setof parents, children and co-parents of the node. Ithas the property that the conditional distribution ofxi , conditioned on all the remaining variables in thegraph, is dependent only on the variables in theMarkov blanket.383xiof xi as well as on the co-parents, in other words variables corresponding to parentsof node xk other than node xi . The set of nodes comprising the parents, the childrenand the co-parents is called the Markov blanket and is illustrated in Figure 8.26.
Wecan think of the Markov blanket of a node xi as being the minimal set of nodes thatisolates xi from the rest of the graph. Note that it is not sufficient to include only theparents and children of node xi because the phenomenon of explaining away meansthat observations of the child nodes will not block paths to the co-parents. We musttherefore observe the co-parent nodes also.8.3. Markov Random FieldsWe have seen that directed graphical models specify a factorization of the joint distribution over a set of variables into a product of local conditional distributions.
Theyalso define a set of conditional independence properties that must be satisfied by anydistribution that factorizes according to the graph. We turn now to the second major class of graphical models that are described by undirected graphs and that againspecify both a factorization and a set of conditional independence relations.A Markov random field, also known as a Markov network or an undirectedgraphical model (Kindermann and Snell, 1980), has a set of nodes each of whichcorresponds to a variable or group of variables, as well as a set of links each ofwhich connects a pair of nodes. The links are undirected, that is they do not carryarrows.
In the case of undirected graphs, it is convenient to begin with a discussionof conditional independence properties.8.3.1 Conditional independence propertiesSection 8.2In the case of directed graphs, we saw that it was possible to test whether a particular conditional independence property holds by applying a graphical test calledd-separation. This involved testing whether or not the paths connecting two sets ofnodes were ‘blocked’.
The definition of blocked, however, was somewhat subtledue to the presence of paths having head-to-head nodes. We might ask whether itis possible to define an alternative graphical semantics for probability distributionssuch that conditional independence is determined by simple graph separation. Thisis indeed the case and corresponds to undirected graphical models. By removing the3848.
GRAPHICAL MODELSFigure 8.27An example of an undirected graph inwhich every path from any node in setA to any node in set B passes throughat least one node in set C. Consequently the conditional independenceproperty A ⊥⊥ B | C holds for anyprobability distribution described by thisgraph.CBAdirectionality from the links of the graph, the asymmetry between parent and childnodes is removed, and so the subtleties associated with head-to-head nodes no longerarise.Suppose that in an undirected graph we identify three sets of nodes, denoted A,B, and C, and that we consider the conditional independence propertyA⊥⊥ B | C.(8.37)To test whether this property is satisfied by a probability distribution defined by agraph we consider all possible paths that connect nodes in set A to nodes in setB.
If all such paths pass through one or more nodes in set C, then all such paths are‘blocked’ and so the conditional independence property holds. However, if there is atleast one such path that is not blocked, then the property does not necessarily hold, ormore precisely there will exist at least some distributions corresponding to the graphthat do not satisfy this conditional independence relation. This is illustrated with anexample in Figure 8.27.
Note that this is exactly the same as the d-separation criterion except that there is no ‘explaining away’ phenomenon. Testing for conditionalindependence in undirected graphs is therefore simpler than in directed graphs.An alternative way to view the conditional independence test is to imagine removing all nodes in set C from the graph together with any links that connect tothose nodes. We then ask if there exists a path that connects any node in A to anynode in B.
If there are no such paths, then the conditional independence propertymust hold.The Markov blanket for an undirected graph takes a particularly simple form,because a node will be conditionally independent of all other nodes conditioned onlyon the neighbouring nodes, as illustrated in Figure 8.28.8.3.2 Factorization propertiesWe now seek a factorization rule for undirected graphs that will correspond tothe above conditional independence test. Again, this will involve expressing the jointdistribution p(x) as a product of functions defined over sets of variables that are localto the graph.
We therefore need to decide what is the appropriate notion of localityin this case.8.3. Markov Random FieldsFigure 8.28385For an undirected graph, the Markov blanket of a nodexi consists of the set of neighbouring nodes. It has theproperty that the conditional distribution of xi , conditionedon all the remaining variables in the graph, is dependentonly on the variables in the Markov blanket.If we consider two nodes xi and xj that are not connected by a link, then thesevariables must be conditionally independent given all other nodes in the graph.