Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 98
Текст из файла (страница 98)
Becausethere are loops in the graph, this raises the problem of how to initiate the messagepassing algorithm. To resolve this, we suppose that an initial message given by theunit function has been passed across every link in each direction. Every node is thenin a position to send a message.There are now many possible ways to organize the message passing schedule.For example, the flooding schedule simultaneously passes a message across everylink in both directions at each time step, whereas schedules that pass one message ata time are called serial schedules.Following Kschischnang et al.
(2001), we will say that a (variable or factor)node a has a message pending on its link to a node b if node a has received anymessage on any of its other links since the last time it send a message to b. Thus,when a node receives a message on one of its links, this creates pending messageson all of its other links. Only pending messages need to be transmitted because4188. GRAPHICAL MODELSExercise 8.29other messages would simply duplicate the previous message on the same link. Forgraphs that have a tree structure, any schedule that sends only pending messageswill eventually terminate once a message has passed in each direction across everylink.
At this point, there are no pending messages, and the product of the receivedmessages at every variable give the exact marginal. In graphs having loops, however,the algorithm may never terminate because there might always be pending messages,although in practice it is generally found to converge within a reasonable time formost applications. Once the algorithm has converged, or once it has been stoppedif convergence is not observed, the (approximate) local marginals can be computedusing the product of the most recently received incoming messages to each variablenode or factor node on every link.In some applications, the loopy belief propagation algorithm can give poor results, whereas in other applications it has proven to be very effective.
In particular,state-of-the-art algorithms for decoding certain kinds of error-correcting codes areequivalent to loopy belief propagation (Gallager, 1963; Berrou et al., 1993; McElieceet al., 1998; MacKay and Neal, 1999; Frey, 1998).8.4.8 Learning the graph structureIn our discussion of inference in graphical models, we have assumed that thestructure of the graph is known and fixed. However, there is also interest in going beyond the inference problem and learning the graph structure itself from data(Friedman and Koller, 2003). This requires that we define a space of possible structures as well as a measure that can be used to score each structure.From a Bayesian viewpoint, we would ideally like to compute a posterior distribution over graph structures and to make predictions by averaging with respectto this distribution.
If we have a prior p(m) over graphs indexed by m, then theposterior distribution is given byp(m|D) ∝ p(m)p(D|m)(8.103)where D is the observed data set. The model evidence p(D|m) then provides thescore for each model. However, evaluation of the evidence involves marginalizationover the latent variables and presents a challenging computational problem for manymodels.Exploring the space of structures can also be problematic. Because the numberof different graph structures grows exponentially with the number of nodes, it isoften necessary to resort to heuristics to find good candidates.Exercises8.1 () www By marginalizing out the variables in order, show that the representation(8.5) for the joint distribution of a directed graph is correctly normalized, providedeach of the conditional distributions is normalized.8.2 () www Show that the property of there being no directed cycles in a directedgraph follows from the statement that there exists an ordered numbering of the nodessuch that for each node there are no links going to a lower-numbered node.ExercisesTable 8.2The joint distribution over three binary variables.a00001111b00110011c01010101419p(a, b, c)0.1920.1440.0480.2160.1920.0640.0480.0968.3 ( ) Consider three binary variables a, b, c ∈ {0, 1} having the joint distributiongiven in Table 8.2.
Show by direct evaluation that this distribution has the propertythat a and b are marginally dependent, so that p(a, b) = p(a)p(b), but that theybecome independent when conditioned on c, so that p(a, b|c) = p(a|c)p(b|c) forboth c = 0 and c = 1.8.4 ( ) Evaluate the distributions p(a), p(b|c), and p(c|a) corresponding to the jointdistribution given in Table 8.2. Hence show by direct evaluation that p(a, b, c) =p(a)p(c|a)p(b|c). Draw the corresponding directed graph.8.5 () www Draw a directed probabilistic graphical model corresponding to therelevance vector machine described by (7.79) and (7.80).8.6 () For the model shown in Figure 8.13, we have seen that the number of parametersrequired to specify the conditional distribution p(y|x1 , .
. . , xM ), where xi ∈ {0, 1},could be reduced from 2M to M + 1 by making use of the logistic sigmoid representation (8.10). An alternative representation (Pearl, 1988) is given byMp(y = 1|x1 , . . . , xM ) = 1 − (1 − µ0 ) (1 − µi )xi(8.104)i=1where the parameters µi represent the probabilities p(xi = 1), and µ0 is an additionalparameters satisfying 0 µ0 1.
The conditional distribution (8.104) is known asthe noisy-OR. Show that this can be interpreted as a ‘soft’ (probabilistic) form of thelogical OR function (i.e., the function that gives y = 1 whenever at least one of thexi = 1). Discuss the interpretation of µ0 .8.7 ( ) Using the recursion relations (8.15) and (8.16), show that the mean and covariance of the joint distribution for the graph shown in Figure 8.14 are given by (8.17)and (8.18), respectively.8.8 () wwwShow that a ⊥⊥ b, c | d implies a ⊥⊥ b | d.8.9 () www Using the d-separation criterion, show that the conditional distributionfor a node x in a directed graph, conditioned on all of the nodes in the Markovblanket, is independent of the remaining variables in the graph.4208. GRAPHICAL MODELSFigure 8.54Example of a graphical model used to explore the con- aditional independence properties of the head-to-headpath a–c–b when a descendant of c, namely the noded, is observed.bcd8.10 () Consider the directed graph shown in Figure 8.54 in which none of the variablesis observed.
Show that a ⊥⊥ b | ∅. Suppose we now observe the variable d. Showthat in general a ⊥⊥ b | d.8.11 ( ) Consider the example of the car fuel system shown in Figure 8.21, and supposethat instead of observing the state of the fuel gauge G directly, the gauge is seen bythe driver D who reports to us the reading on the gauge.
This report is either that thegauge shows full D = 1 or that it shows empty D = 0. Our driver is a bit unreliable,as expressed through the following probabilitiesp(D = 1|G = 1) = 0.9p(D = 0|G = 0) = 0.9.(8.105)(8.106)Suppose that the driver tells us that the fuel gauge shows empty, in other wordsthat we observe D = 0. Evaluate the probability that the tank is empty given onlythis observation. Similarly, evaluate the corresponding probability given also theobservation that the battery is flat, and note that this second probability is lower.Discuss the intuition behind this result, and relate the result to Figure 8.54.8.12 () www Show that there are 2M (M −1)/2 distinct undirected graphs over a set ofM distinct random variables.
Draw the 8 possibilities for the case of M = 3.8.13 () Consider the use of iterated conditional modes (ICM) to minimize the energyfunction given by (8.42). Write down an expression for the difference in the valuesof the energy associated with the two states of a particular variable xj , with all othervariables held fixed, and show that it depends only on quantities that are local to xjin the graph.8.14 () Consider a particular case of the energy function given by (8.42) in which thecoefficients β = h = 0.
Show that the most probable configuration of the latentvariables is given by xi = yi for all i.8.15 ( ) www Show that the joint distribution p(xn−1 , xn ) for two neighbouringnodes in the graph shown in Figure 8.38 is given by an expression of the form (8.58).Exercises4218.16 ( ) Consider the inference problem of evaluating p(xn |xN ) for the graph shownin Figure 8.38, for all nodes n ∈ {1, . . . , N − 1}. Show that the message passingalgorithm discussed in Section 8.4.1 can be used to solve this efficiently, and discusswhich messages are modified and in what way.8.17 ( ) Consider a graph of the form shown in Figure 8.38 having N = 5 nodes, inwhich nodes x3 and x5 are observed.
Use d-separation to show that x2 ⊥⊥ x5 | x3 .Show that if the message passing algorithm of Section 8.4.1 is applied to the evaluation of p(x2 |x3 , x5 ), the result will be independent of the value of x5 .8.18 ( ) www Show that a distribution represented by a directed tree can triviallybe written as an equivalent distribution over the corresponding undirected tree. Alsoshow that a distribution expressed as an undirected tree can, by suitable normalization of the clique potentials, be written as a directed tree.