Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 32
Текст из файла (страница 32)
First bring the integral over y insidethe integrand of the integral over x, next make the change of variable t = y + xwhere x is fixed, then interchange the order of the x and t integrations, and finallymake the change of variable x = tµ where t is fixed.2.6 () Make use of the result (2.265) to show that the mean, variance, and mode of thebeta distribution (2.13) are given respectively byE[µ] =var[µ] =mode[µ] =aa+b(2.267)abb)2 (a(a ++ b + 1)a−1.a+b−2(2.268)(2.269)Exercises1292.7 ( ) Consider a binomial random variable x given by (2.9), with prior distributionfor µ given by the beta distribution (2.13), and suppose we have observed m occurrences of x = 1 and l occurrences of x = 0. Show that the posterior mean value of xlies between the prior mean and the maximum likelihood estimate for µ.
To do this,show that the posterior mean can be written as λ times the prior mean plus (1 − λ)times the maximum likelihood estimate, where 0 λ 1. This illustrates the concept of the posterior distribution being a compromise between the prior distributionand the maximum likelihood solution.2.8 () Consider two variables x and y with joint distribution p(x, y). Prove the following two resultsE[x] = Ey [Ex [x|y]]var[x] = Ey [varx [x|y]] + vary [Ex [x|y]] .(2.270)(2.271)Here Ex [x|y] denotes the expectation of x under the conditional distribution p(x|y),with a similar notation for the conditional variance.2.9 ( ) www . In this exercise, we prove the normalization of the Dirichlet distribution (2.38) using induction.
We have already shown in Exercise 2.5 that thebeta distribution, which is a special case of the Dirichlet for M = 2, is normalized.We now assume that the Dirichlet distribution is normalized for M − 1 variablesand prove that it is normalized for M variables. To do this, consider the DirichletMdistribution over M variables, and take account of the constraint k=1 µk = 1 byeliminating µM , so that the Dirichlet is writtenpM (µ1 , .
. . , µM −1 ) = CMM−1k −1µαk1−k=1M−1αM −1µj(2.272)j =1and our goal is to find an expression for CM . To do this, integrate over µM −1 , takingcare over the limits of integration, and then make a change of variable so that thisintegral has limits 0 and 1. By assuming the correct result for CM −1 and making useof (2.265), derive the expression for CM .2.10 ( ) Using the property Γ(x + 1) = xΓ(x) of the gamma function, derive thefollowing results for the mean, variance, and covariance of the Dirichlet distributiongiven by (2.38)αjα0αj (α0 − αj )var[µj ] =α02 (α0 + 1)αj α l,cov[µj µl ] = − 2α0 (α0 + 1)E[µj ] =where α0 is defined by (2.39).(2.273)(2.274)j = l(2.275)1302. PROBABILITY DISTRIBUTIONS2.11 () www By expressing the expectation of ln µj under the Dirichlet distribution(2.38) as a derivative with respect to αj , show thatE[ln µj ] = ψ(αj ) − ψ(α0 )(2.276)where α0 is given by (2.39) andψ(a) ≡dln Γ(a)da(2.277)is the digamma function.2.12 () The uniform distribution for a continuous variable x is defined byU(x|a, b) =1,b−aa x b.(2.278)Verify that this distribution is normalized, and find expressions for its mean andvariance.2.13 ( ) Evaluate the Kullback-Leibler divergence (1.113) between two Gaussiansp(x) = N (x|µ, Σ) and q(x) = N (x|m, L).2.14 ( ) www This exercise demonstrates that the multivariate distribution with maximum entropy, for a given covariance, is a Gaussian.
The entropy of a distributionp(x) is given byH[x] = −p(x) ln p(x) dx.(2.279)We wish to maximize H[x] over all distributions p(x) subject to the constraints thatp(x) be normalized and that it have a specific mean and covariance, so thatp(x) dx = 1(2.280)p(x)x dx = µ(2.281)p(x)(x − µ)(x − µ)T dx = Σ.(2.282)By performing a variational maximization of (2.279) and using Lagrange multipliersto enforce the constraints (2.280), (2.281), and (2.282), show that the maximumlikelihood distribution is given by the Gaussian (2.43).2.15 ( ) Show that the entropy of the multivariate Gaussian N (x|µ, Σ) is given byH[x] =D1ln |Σ| + (1 + ln(2π))22where D is the dimensionality of x.(2.283)Exercises1312.16 ( ) www Consider two random variables x1 and x2 having Gaussian distributions with means µ1 , µ2 and precisions τ1 , τ2 respectively.
Derive an expressionfor the differential entropy of the variable x = x1 + x2 . To do this, first find thedistribution of x by using the relation ∞p(x|x2 )p(x2 ) dx2(2.284)p(x) =−∞and completing the square in the exponent. Then observe that this represents theconvolution of two Gaussian distributions, which itself will be Gaussian, and finallymake use of the result (1.110) for the entropy of the univariate Gaussian.2.17 () www Consider the multivariate Gaussian distribution given by (2.43). Bywriting the precision matrix (inverse covariance matrix) Σ−1 as the sum of a symmetric and an anti-symmetric matrix, show that the anti-symmetric term does notappear in the exponent of the Gaussian, and hence that the precision matrix may betaken to be symmetric without loss of generality.
Because the inverse of a symmetricmatrix is also symmetric (see Exercise 2.22), it follows that the covariance matrixmay also be chosen to be symmetric without loss of generality.2.18 ( ) Consider a real, symmetric matrix Σ whose eigenvalue equation is givenby (2.45). By taking the complex conjugate of this equation and subtracting theoriginal equation, and then forming the inner product with eigenvector ui , show thatthe eigenvalues λi are real. Similarly, use the symmetry property of Σ to show thattwo eigenvectors ui and uj will be orthogonal provided λj = λi .
Finally, show thatwithout loss of generality, the set of eigenvectors can be chosen to be orthonormal,so that they satisfy (2.46), even if some of the eigenvalues are zero.2.19 ( ) Show that a real, symmetric matrix Σ having the eigenvector equation (2.45)can be expressed as an expansion in the eigenvectors, with coefficients given by theeigenvalues, of the form (2.48). Similarly, show that the inverse matrix Σ−1 has arepresentation of the form (2.49).2.20 ( ) www A positive definite matrix Σ can be defined as one for which thequadratic form(2.285)aT Σais positive for any real value of the vector a.
Show that a necessary and sufficientcondition for Σ to be positive definite is that all of the eigenvalues λi of Σ, definedby (2.45), are positive.2.21 () Show that a real, symmetric matrix of size D × D has D(D + 1)/2 independentparameters.2.22 () wwwShow that the inverse of a symmetric matrix is itself symmetric.2.23 ( ) By diagonalizing the coordinate system using the eigenvector expansion (2.45),show that the volume contained within the hyperellipsoid corresponding to a constant1322. PROBABILITY DISTRIBUTIONSMahalanobis distance ∆ is given byVD |Σ|1/2 ∆D(2.286)where VD is the volume of the unit sphere in D dimensions, and the Mahalanobisdistance is defined by (2.44).2.24 ( ) wwwProve the identity (2.76) by multiplying both sides by the matrixA B(2.287)C Dand making use of the definition (2.77).2.25 ( ) In Sections 2.3.1 and 2.3.2, we considered the conditional and marginal distributions for a multivariate Gaussian.
More generally, we can consider a partitioningof the components of x into three groups xa , xb , and xc , with a corresponding partitioning of the mean vector µ and of the covariance matrix Σ in the form µaΣaa Σab ΣacΣ = Σba Σbb Σbc .(2.288)µ = µb ,µcΣca Σcb ΣccBy making use of the results of Section 2.3, find an expression for the conditionaldistribution p(xa |xb ) in which xc has been marginalized out.2.26 ( ) A very useful result from linear algebra is the Woodbury matrix inversionformula given by(A + BCD)−1 = A−1 − A−1 B(C−1 + DA−1 B)−1 DA−1 .(2.289)By multiplying both sides by (A + BCD) prove the correctness of this result.2.27 () Let x and z be two independent random vectors, so that p(x, z) = p(x)p(z).Show that the mean of their sum y = x + z is given by the sum of the means of eachof the variable separately.
Similarly, show that the covariance matrix of y is given bythe sum of the covariance matrices of x and z. Confirm that this result agrees withthat of Exercise 1.10.2.28 ( ) wwwConsider a joint distribution over the variable xz=y(2.290)whose mean and covariance are given by (2.108) and (2.105) respectively. By making use of the results (2.92) and (2.93) show that the marginal distribution p(x) isgiven (2.99). Similarly, by making use of the results (2.81) and (2.82) show that theconditional distribution p(y|x) is given by (2.100).Exercises1332.29 ( ) Using the partitioned matrix inversion formula (2.76), show that the inverse ofthe precision matrix (2.104) is given by the covariance matrix (2.105).2.30 () By starting from (2.107) and making use of the result (2.105), verify the result(2.108).2.31 ( ) Consider two multidimensional random vectors x and z having Gaussiandistributions p(x) = N (x|µx , Σx ) and p(z) = N (z|µz , Σz ) respectively, togetherwith their sum y = x+z.
Use the results (2.109) and (2.110) to find an expression forthe marginal distribution p(y) by considering the linear-Gaussian model comprisingthe product of the marginal distribution p(x) and the conditional distribution p(y|x).2.32 ( ) wwwThis exercise and the next provide practice at manipulating thequadratic forms that arise in linear-Gaussian models, as well as giving an independent check of results derived in the main text. Consider a joint distribution p(x, y)defined by the marginal and conditional distributions given by (2.99) and (2.100).By examining the quadratic form in the exponent of the joint distribution, and usingthe technique of ‘completing the square’ discussed in Section 2.3, find expressionsfor the mean and covariance of the marginal distribution p(y) in which the variablex has been integrated out.
To do this, make use of the Woodbury matrix inversionformula (2.289). Verify that these results agree with (2.109) and (2.110) obtainedusing the results of Chapter 2.2.33 ( ) Consider the same joint distribution as in Exercise 2.32, but now use thetechnique of completing the square to find expressions for the mean and covarianceof the conditional distribution p(x|y). Again, verify that these agree with the corresponding expressions (2.111) and (2.112).2.34 ( ) www To find the maximum likelihood solution for the covariance matrixof a multivariate Gaussian, we need to maximize the log likelihood function (2.118)with respect to Σ, noting that the covariance matrix must be symmetric and positivedefinite. Here we proceed by ignoring these constraints and doing a straightforwardmaximization.
Using the results (C.21), (C.26), and (C.28) from Appendix C, showthat the covariance matrix Σ that maximizes the log likelihood function (2.118) isgiven by the sample covariance (2.122). We note that the final result is necessarilysymmetric and positive definite (provided the sample covariance is nonsingular).2.35 ( ) Use the result (2.59) to prove (2.62).