Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 87
Текст из файла (страница 87)
A formal proof can be found in Lauritzen (1996).8.2.1 Three example graphsWe begin our discussion of the conditional independence properties of directedgraphs by considering three simple examples each involving graphs having just threenodes. Together, these will motivate and illustrate the key concepts of d-separation.The first of the three examples is shown in Figure 8.15, and the joint distributioncorresponding to this graph is easily written down using the general result (8.5) togivep(a, b, c) = p(a|c)p(b|c)p(c).(8.23)If none of the variables are observed, then we can investigate whether a and b areindependent by marginalizing both sides of (8.23) with respect to c to givep(a|c)p(b|c)p(c).(8.24)p(a, b) =cIn general, this does not factorize into the product p(a)p(b), and soa ⊥⊥b|∅(8.25)3748.
GRAPHICAL MODELSFigure 8.16cAs in Figure 8.15 but where we have conditioned on thevalue of variable c.abwhere ∅ denotes the empty set, and the symbol ⊥⊥ means that the conditional independence property does not hold in general. Of course, it may hold for a particulardistribution by virtue of the specific numerical values associated with the variousconditional probabilities, but it does not follow in general from the structure of thegraph.Now suppose we condition on the variable c, as represented by the graph ofFigure 8.16.
From (8.23), we can easily write down the conditional distribution of aand b, given c, in the formp(a, b, c)p(c)= p(a|c)p(b|c)p(a, b|c) =and so we obtain the conditional independence propertya⊥⊥ b | c.We can provide a simple graphical interpretation of this result by consideringthe path from node a to node b via c.
The node c is said to be tail-to-tail with respect to this path because the node is connected to the tails of the two arrows, andthe presence of such a path connecting nodes a and b causes these nodes to be dependent. However, when we condition on node c, as in Figure 8.16, the conditionednode ‘blocks’ the path from a to b and causes a and b to become (conditionally)independent.We can similarly consider the graph shown in Figure 8.17. The joint distributioncorresponding to this graph is again obtained from our general formula (8.5) to givep(a, b, c) = p(a)p(c|a)p(b|c).(8.26)First of all, suppose that none of the variables are observed.
Again, we can test tosee if a and b are independent by marginalizing over c to givep(c|a)p(b|c) = p(a)p(b|a).p(a, b) = p(a)cFigure 8.17The second of our three examples of 3-node agraphs used to motivate the conditional independence framework for directed graphical models.cb8.2. Conditional IndependenceFigure 8.18As in Figure 8.17 but now conditioning on node c.ac375bwhich in general does not factorize into p(a)p(b), and soa ⊥⊥ b | ∅(8.27)as before.Now suppose we condition on node c, as shown in Figure 8.18. Using Bayes’theorem, together with (8.26), we obtainp(a, b, c)p(c)p(a)p(c|a)p(b|c)=p(c)= p(a|c)p(b|c)p(a, b|c) =and so again we obtain the conditional independence propertya ⊥⊥ b | c.As before, we can interpret these results graphically. The node c is said to behead-to-tail with respect to the path from node a to node b.
Such a path connectsnodes a and b and renders them dependent. If we now observe c, as in Figure 8.18,then this observation ‘blocks’ the path from a to b and so we obtain the conditionalindependence property a ⊥⊥ b | c.Finally, we consider the third of our 3-node examples, shown by the graph inFigure 8.19. As we shall see, this has a more subtle behaviour than the two previousgraphs.The joint distribution can again be written down using our general result (8.5) togivep(a, b, c) = p(a)p(b)p(c|a, b).(8.28)Consider first the case where none of the variables are observed.
Marginalizing bothsides of (8.28) over c we obtainp(a, b) = p(a)p(b)Figure 8.19The last of our three examples of 3-node graphs used to aexplore conditional independence properties in graphical models. This graph has rather different propertiesfrom the two previous examples.bc3768.
GRAPHICAL MODELSFigure 8.20As in Figure 8.19 but conditioning on the value of node ac. In this graph, the act of conditioning induces a dependence between a and b.bcand so a and b are independent with no variables observed, in contrast to the twoprevious examples. We can write this result asa⊥⊥ b | ∅.(8.29)Now suppose we condition on c, as indicated in Figure 8.20. The conditional distribution of a and b is then given byp(a, b|c) ==p(a, b, c)p(c)p(a)p(b)p(c|a, b)p(c)which in general does not factorize into the product p(a)p(b), and soa ⊥⊥ b | c.Exercise 8.10Thus our third example has the opposite behaviour from the first two. Graphically,we say that node c is head-to-head with respect to the path from a to b because itconnects to the heads of the two arrows.
When node c is unobserved, it ‘blocks’the path, and the variables a and b are independent. However, conditioning on c‘unblocks’ the path and renders a and b dependent.There is one more subtlety associated with this third example that we need toconsider. First we introduce some more terminology. We say that node y is a descendant of node x if there is a path from x to y in which each step of the pathfollows the directions of the arrows. Then it can be shown that a head-to-head pathwill become unblocked if either the node, or any of its descendants, is observed.In summary, a tail-to-tail node or a head-to-tail node leaves a path unblockedunless it is observed in which case it blocks the path. By contrast, a head-to-headnode blocks a path if it is unobserved, but once the node, and/or at least one of itsdescendants, is observed the path becomes unblocked.It is worth spending a moment to understand further the unusual behaviour of thegraph of Figure 8.20.
Consider a particular instance of such a graph correspondingto a problem with three binary random variables relating to the fuel system on a car,as shown in Figure 8.21. The variables are called B, representing the state of abattery that is either charged (B = 1) or flat (B = 0), F representing the state ofthe fuel tank that is either full of fuel (F = 1) or empty (F = 0), and G, which isthe state of an electric fuel gauge and which indicates either full (G = 1) or empty3778.2. Conditional IndependenceBFBFGBGFGFigure 8.21 An example of a 3-node graph used to illustrate the phenomenon of ‘explaining away’. The threenodes represent the state of the battery (B), the state of the fuel tank (F ) and the reading on the electric fuelgauge (G).
See the text for details.(G = 0). The battery is either charged or flat, and independently the fuel tank iseither full or empty, with prior probabilitiesp(B = 1) = 0.9p(F = 1) = 0.9.Given the state of the fuel tank and the battery, the fuel gauge reads full with probabilities given byp(G = 1|Bp(G = 1|Bp(G = 1|Bp(G = 1|B= 1, F= 1, F= 0, F= 0, F= 1)= 0)= 1)= 0)====0.80.20.20.1so this is a rather unreliable fuel gauge! All remaining probabilities are determinedby the requirement that probabilities sum to one, and so we have a complete specification of the probabilistic model.Before we observe any data, the prior probability of the fuel tank being emptyis p(F = 0) = 0.1. Now suppose that we observe the fuel gauge and discover thatit reads empty, i.e., G = 0, corresponding to the middle graph in Figure 8.21.
Wecan use Bayes’ theorem to evaluate the posterior probability of the fuel tank beingempty. First we evaluate the denominator for Bayes’ theorem given byp(G = 0|B, F )p(B)p(F ) = 0.315(8.30)p(G = 0) =B∈{0,1} F ∈{0,1}and similarly we evaluatep(G = 0|F = 0) =p(G = 0|B, F = 0)p(B) = 0.81(8.31)p(G = 0|F = 0)p(F = 0) 0.257p(G = 0)(8.32)B∈{0,1}and using these results we havep(F = 0|G = 0) =3788.
GRAPHICAL MODELSand so p(F = 0|G = 0) > p(F = 0). Thus observing that the gauge reads emptymakes it more likely that the tank is indeed empty, as we would intuitively expect.Next suppose that we also check the state of the battery and find that it is flat, i.e.,B = 0. We have now observed the states of both the fuel gauge and the battery, asshown by the right-hand graph in Figure 8.21. The posterior probability that the fueltank is empty given the observations of both the fuel gauge and the battery state isthen given byp(G = 0|B = 0, F = 0)p(F = 0)p(F = 0|G = 0, B = 0) = 0.111 (8.33)F ∈{0,1} p(G = 0|B = 0, F )p(F )where the prior probability p(B = 0) has cancelled between numerator and denominator.
Thus the probability that the tank is empty has decreased (from 0.257 to0.111) as a result of the observation of the state of the battery. This accords with ourintuition that finding out that the battery is flat explains away the observation that thefuel gauge reads empty. We see that the state of the fuel tank and that of the batteryhave indeed become dependent on each other as a result of observing the readingon the fuel gauge.
In fact, this would also be the case if, instead of observing thefuel gauge directly, we observed the state of some descendant of G. Note that theprobability p(F = 0|G = 0, B = 0) 0.111 is greater than the prior probabilityp(F = 0) = 0.1 because the observation that the fuel gauge reads zero still providessome evidence in favour of an empty fuel tank.8.2.2 D-separationWe now give a general statement of the d-separation property (Pearl, 1988) fordirected graphs. Consider a general directed graph in which A, B, and C are arbitrary nonintersecting sets of nodes (whose union may be smaller than the completeset of nodes in the graph). We wish to ascertain whether a particular conditionalindependence statement A ⊥⊥ B | C is implied by a given directed acyclic graph. Todo so, we consider all possible paths from any node in A to any node in B.