Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 33
Текст из файла (страница 33)
Now, using the results (2.59), and (2.62),show thatE[xn xm ] = µµT + Inm Σ(2.291)where xn denotes a data point sampled from a Gaussian distribution with mean µand covariance Σ, and Inm denotes the (n, m) element of the identity matrix. Henceprove the result (2.124).2.36 ( ) www Using an analogous procedure to that used to obtain (2.126), derivean expression for the sequential estimation of the variance of a univariate Gaussian1342.
PROBABILITY DISTRIBUTIONSdistribution, by starting with the maximum likelihood expression2σMLN1 =(xn − µ)2 .N(2.292)n=1Verify that substituting the expression for a Gaussian distribution into the RobbinsMonro sequential estimation formula (2.135) gives a result of the same form, andhence obtain an expression for the corresponding coefficients aN .2.37 ( ) Using an analogous procedure to that used to obtain (2.126), derive an expression for the sequential estimation of the covariance of a multivariate Gaussiandistribution, by starting with the maximum likelihood expression (2.122).
Verify thatsubstituting the expression for a Gaussian distribution into the Robbins-Monro sequential estimation formula (2.135) gives a result of the same form, and hence obtainan expression for the corresponding coefficients aN .2.38 () Use the technique of completing the square for the quadratic form in the exponent to derive the results (2.141) and (2.142).2.39 ( ) Starting from the results (2.141) and (2.142) for the posterior distributionof the mean of a Gaussian random variable, dissect out the contributions from thefirst N − 1 data points and hence obtain expressions for the sequential update of2. Now derive the same results starting from the posterior distributionµN and σN2p(µ|x1 , . . .
, xN −1 ) = N (µ|µN −1 , σN−1 ) and multiplying by the likelihood func2tion p(xN |µ) = N (xN |µ, σ ) and then completing the square and normalizing toobtain the posterior distribution after N observations.2.40 ( ) www Consider a D-dimensional Gaussian random variable x with distribution N (x|µ, Σ) in which the covariance Σ is known and for which we wish to inferthe mean µ from a set of observations X = {x1 , .
. . , xN }. Given a prior distributionp(µ) = N (µ|µ0 , Σ0 ), find the corresponding posterior distribution p(µ|X).2.41 () Use the definition of the gamma function (1.141) to show that the gamma distribution (2.146) is normalized.2.42 ( ) Evaluate the mean, variance, and mode of the gamma distribution (2.146).2.43 () The following distributionp(x|σ 2 , q) =|x|qexp−2σ 22(2σ 2 )1/q Γ(1/q)q(2.293)is a generalization of the univariate Gaussian distribution. Show that this distributionis normalized so that ∞p(x|σ 2 , q) dx = 1(2.294)−∞and that it reduces to the Gaussian when q = 2.
Consider a regression model inwhich the target variable is given by t = y(x, w) + and is a random noiseExercises135variable drawn from the distribution (2.293). Show that the log likelihood functionover w and σ 2 , for an observed data set of input vectors X = {x1 , . . .
, xN } andcorresponding target variables t = (t1 , . . . , tN )T , is given byN1 N|y(xn , w) − tn |q −ln(2σ 2 ) + constln p(t|X, w, σ ) = − 22σq2(2.295)n=1where ‘const’ denotes terms independent of both w and σ 2 . Note that, as a functionof w, this is the Lq error function considered in Section 1.5.5.2.44 ( ) Consider a univariate Gaussian distribution N (x|µ, τ −1 ) having conjugateGaussian-gamma prior given by (2.154), and a data set x = {x1 , . . . , xN } of i.i.d.observations. Show that the posterior distribution is also a Gaussian-gamma distribution of the same functional form as the prior, and write down expressions for theparameters of this posterior distribution.2.45 () Verify that the Wishart distribution defined by (2.155) is indeed a conjugateprior for the precision matrix of a multivariate Gaussian.2.46 () wwwVerify that evaluating the integral in (2.158) leads to the result (2.159).2.47 () www Show that in the limit ν → ∞, the t-distribution (2.159) becomes aGaussian.
Hint: ignore the normalization coefficient, and simply look at the dependence on x.2.48 () By following analogous steps to those used to derive the univariate Student’st-distribution (2.159), verify the result (2.162) for the multivariate form of the Student’s t-distribution, by marginalizing over the variable η in (2.161). Using thedefinition (2.161), show by exchanging integration variables that the multivariatet-distribution is correctly normalized.2.49 ( ) By using the definition (2.161) of the multivariate Student’s t-distribution as aconvolution of a Gaussian with a gamma distribution, verify the properties (2.164),(2.165), and (2.166) for the multivariate t-distribution defined by (2.162).2.50 () Show that in the limit ν → ∞, the multivariate Student’s t-distribution (2.162)reduces to a Gaussian with mean µ and precision Λ.2.51 () www The various trigonometric identities used in the discussion of periodicvariables in this chapter can be proven easily from the relationexp(iA) = cos A + i sin A(2.296)in which i is the square root of minus one.
By considering the identityexp(iA) exp(−iA) = 1(2.297)prove the result (2.177). Similarly, using the identitycos(A − B) = exp{i(A − B)}(2.298)1362. PROBABILITY DISTRIBUTIONSwhere denotes the real part, prove (2.178). Finally, by using sin(A − B) = exp{i(A − B)}, where denotes the imaginary part, prove the result (2.183).2.52 ( ) For large m, the von Mises distribution (2.179) becomes sharply peakedaround the mode θ0 .
By defining ξ = m1/2 (θ − θ0 ) and making the Taylor expansion of the cosine function given byα2+ O(α4 )2show that as m → ∞, the von Mises distribution tends to a Gaussian.cos α = 1 −(2.299)2.53 () Using the trigonometric identity (2.183), show that solution of (2.182) for θ0 isgiven by (2.184).2.54 () By computing first and second derivatives of the von Mises distribution (2.179),and using I0 (m) > 0 for m > 0, show that the maximum of the distribution occurswhen θ = θ0 and that the minimum occurs when θ = θ0 + π (mod 2π).2.55 () By making use of the result (2.168), together with (2.184) and the trigonometricidentity (2.178), show that the maximum likelihood solution mML for the concentration of the von Mises distribution satisfies A(mML ) = r where r is the radius of themean of the observations viewed as unit vectors in the two-dimensional Euclideanplane, as illustrated in Figure 2.17.2.56 ( ) www Express the beta distribution (2.13), the gamma distribution (2.146),and the von Mises distribution (2.179) as members of the exponential family (2.194)and thereby identify their natural parameters.2.57 () Verify that the multivariate Gaussian distribution can be cast in exponentialfamily form (2.194) and derive expressions for η, u(x), h(x) and g(η) analogous to(2.220)–(2.223).2.58 () The result (2.226) showed that the negative gradient of ln g(η) for the exponential family is given by the expectation of u(x).
By taking the second derivatives of(2.195), show that−∇∇ ln g(η) = E[u(x)u(x)T ] − E[u(x)]E[u(x)T ] = cov[u(x)].(2.300)2.59 () By changing variables using y = x/σ, show that the density (2.236) will becorrectly normalized, provided f (x) is correctly normalized.2.60 ( ) www Consider a histogram-like density model in which the space x is divided into fixed regions for which the density p(x) takes the constant value hi overthe ith region, and that the volume of region i is denoted ∆i .
Suppose we have a setof N observations of x such that ni of these observations fall in region i. Using aLagrange multiplier to enforce the normalization constraint on the density, derive anexpression for the maximum likelihood estimator for the {hi }.2.61 () Show that the K-nearest-neighbour density model defines an improper distribution whose integral over all space is divergent.3LinearModels forRegressionThe focus so far in this book has been on unsupervised learning, including topicssuch as density estimation and data clustering.
We turn now to a discussion of supervised learning, starting with regression. The goal of regression is to predict the valueof one or more continuous target variables t given the value of a D-dimensional vector x of input variables. We have already encountered an example of a regressionproblem when we considered polynomial curve fitting in Chapter 1. The polynomialis a specific example of a broad class of functions called linear regression models,which share the property of being linear functions of the adjustable parameters, andwhich will form the focus of this chapter.