Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 27
Текст из файла (страница 27)
The Gaussian Distribution−D/2−ν/2Γ(D/2 + ν/2) |Λ|1/2∆2St(x|µ, Λ, ν) =1+Γ(ν/2)ν(πν)D/2105(2.162)where D is the dimensionality of x, and ∆2 is the squared Mahalanobis distancedefined by(2.163)∆2 = (x − µ)T Λ(x − µ).Exercise 2.49This is the multivariate form of Student’s t-distribution and satisfies the followingpropertiesE[x] = µ,νΛ−1 ,cov[x] =(ν − 2)mode[x] = µifν>1(2.164)ifν>2(2.165)(2.166)with corresponding results for the univariate case.2.3.8 Periodic variablesAlthough Gaussian distributions are of great practical significance, both in theirown right and as building blocks for more complex probabilistic models, there aresituations in which they are inappropriate as density models for continuous variables.
One important case, which arises in practical applications, is that of periodicvariables.An example of a periodic variable would be the wind direction at a particulargeographical location. We might, for instance, measure values of wind direction on anumber of days and wish to summarize this using a parametric distribution. Anotherexample is calendar time, where we may be interested in modelling quantities thatare believed to be periodic over 24 hours or over an annual cycle.
Such quantitiescan conveniently be represented using an angular (polar) coordinate 0 θ < 2π.We might be tempted to treat periodic variables by choosing some directionas the origin and then applying a conventional distribution such as the Gaussian.Such an approach, however, would give results that were strongly dependent on thearbitrary choice of origin. Suppose, for instance, that we have two observations atθ1 = 1◦ and θ2 = 359◦ , and we model them using a standard univariate Gaussiandistribution. If we choose the origin at 0◦ , then the sample mean of this data setwill be 180◦ with standard deviation 179◦ , whereas if we choose the origin at 180◦ ,then the mean will be 0◦ and the standard deviation will be 1◦ .
We clearly need todevelop a special approach for the treatment of periodic variables.Let us consider the problem of evaluating the mean of a set of observationsD = {θ1 , . . . , θN } of a periodic variable. From now on, we shall assume that θ ismeasured in radians. We have already seen that the simple average (θ1 +· · ·+θN )/Nwill be strongly coordinate dependent.
To find an invariant measure of the mean, wenote that the observations can be viewed as points on the unit circle and can thereforebe described instead by two-dimensional unit vectors x1 , . . . , xN where xn = 1for n = 1, . . . , N , as illustrated in Figure 2.17. We can average the vectors {xn }1062. PROBABILITY DISTRIBUTIONSFigure 2.17x2Illustration of the representation of values θn of a periodic variable as twodimensional vectors xn living on the unitcircle. Also shown is the average x ofthose vectors.x4x3x̄r̄x2θ̄x1instead to givex=N1 xnNx1(2.167)n=1and then find the corresponding angle θ of this average. Clearly, this definition willensure that the location of the mean is independent of the origin of the angular coordinate. Note that x will typically lie inside the unit circle.
The Cartesian coordinatesof the observations are given by xn = (cos θn , sin θn ), and we can write the Cartesian coordinates of the sample mean in the form x = (r cos θ, r sin θ). Substitutinginto (2.167) and equating the x1 and x2 components then givesN1 r cos θ =cos θn ,Nn=1N1 r sin θ =sin θn .N(2.168)n=1Taking the ratio, and using the identity tan θ = sin θ/ cos θ, we can solve for θ togivesin θn−1nθ = tan.(2.169)n cos θnShortly, we shall see how this result arises naturally as the maximum likelihoodestimator for an appropriately defined distribution over a periodic variable.We now consider a periodic generalization of the Gaussian called the von Misesdistribution.
Here we shall limit our attention to univariate distributions, althoughperiodic distributions can also be found over hyperspheres of arbitrary dimension.For an extensive discussion of periodic distributions, see Mardia and Jupp (2000).By convention, we will consider distributions p(θ) that have period 2π. Anyprobability density p(θ) defined over θ must not only be nonnegative and integrate1072.3. The Gaussian DistributionFigure 2.18The von Mises distribution can be derived by consideringa two-dimensional Gaussian of the form (2.173), whosedensity contours are shown in blue and conditioning onthe unit circle shown in red.x2p(x)x1r=1to one, but it must also be periodic.
Thus p(θ) must satisfy the three conditionsp(θ) 0(2.170)p(θ) dθ = 1(2.171)p(θ + 2π) = p(θ).(2.172)2π0From (2.172), it follows that p(θ + M 2π) = p(θ) for any integer M .We can easily obtain a Gaussian-like distribution that satisfies these three properties as follows. Consider a Gaussian distribution over two variables x = (x1 , x2 )having mean µ = (µ1 , µ2 ) and a covariance matrix Σ = σ 2 I where I is the 2 × 2identity matrix, so that1(x1 − µ1 )2 + (x2 − µ2 )2exp −.(2.173)p(x1 , x2 ) =2πσ 22σ 2The contours of constant p(x) are circles, as illustrated in Figure 2.18. Now supposewe consider the value of this distribution along a circle of fixed radius.
Then by construction this distribution will be periodic, although it will not be normalized. We candetermine the form of this distribution by transforming from Cartesian coordinates(x1 , x2 ) to polar coordinates (r, θ) so thatx1 = r cos θ,x2 = r sin θ.(2.174)We also map the mean µ into polar coordinates by writingµ1 = r0 cos θ0 ,µ2 = r0 sin θ0 .(2.175)Next we substitute these transformations into the two-dimensional Gaussian distribution (2.173), and then condition on the unit circle r = 1, noting that we are interestedonly in the dependence on θ. Focussing on the exponent in the Gaussian distributionwe have1 − 2 (r cos θ − r0 cos θ0 )2 + (r sin θ − r0 sin θ0 )22σ1 = − 2 1 + r02 − 2r0 cos θ cos θ0 − 2r0 sin θ sin θ02σr0=cos(θ − θ0 ) + const(2.176)σ21082.
PROBABILITY DISTRIBUTIONSπ/4m = 5, θ0 = π/43π/4m = 1, θ0 = 3π/402πm = 5, θ0 = π/4m = 1, θ0 = 3π/4Figure 2.19 The von Mises distribution plotted for two different parameter values, shown as a Cartesian ploton the left and as the corresponding polar plot on the right.Exercise 2.51where ‘const’ denotes terms independent of θ, and we have made use of the followingtrigonometrical identitiescos2 A + sin2 A = 1cos A cos B + sin A sin B = cos(A − B).(2.177)(2.178)If we now define m = r0 /σ 2 , we obtain our final expression for the distribution ofp(θ) along the unit circle r = 1 in the formp(θ|θ0 , m) =1exp {m cos(θ − θ0 )}2πI0 (m)(2.179)which is called the von Mises distribution, or the circular normal. Here the parameter θ0 corresponds to the mean of the distribution, while m, which is known asthe concentration parameter, is analogous to the inverse variance (precision) for theGaussian.
The normalization coefficient in (2.179) is expressed in terms of I0 (m),which is the zeroth-order Bessel function of the first kind (Abramowitz and Stegun,1965) and is defined by 2π1exp {m cos θ} dθ.(2.180)I0 (m) =2π 0Exercise 2.52For large m, the distribution becomes approximately Gaussian. The von Mises distribution is plotted in Figure 2.19, and the function I0 (m) is plotted in Figure 2.20.Now consider the maximum likelihood estimators for the parameters θ0 and mfor the von Mises distribution.
The log likelihood function is given byln p(D|θ0 , m) = −N ln(2π) − N ln I0 (m) + mNn=1cos(θn − θ0 ).(2.181)2.3. The Gaussian Distribution300010912000I0 (m)A(m) 0.510000Figure 2.20(2.186).05m01005m10Plot of the Bessel function I0 (m) defined by (2.180), together with the function A(m) defined bySetting the derivative with respect to θ0 equal to zero givesNsin(θn − θ0 ) = 0.(2.182)n=1To solve for θ0 , we make use of the trigonometric identitysin(A − B) = cos B sin A − cos A sin BExercise 2.53(2.183)sin θnn= tan(2.184)n cos θnwhich we recognize as the result (2.169) obtained earlier for the mean of the observations viewed in a two-dimensional Cartesian space.Similarly, maximizing (2.181) with respect to m, and making use of I0 (m) =I1 (m) (Abramowitz and Stegun, 1965), we havefrom which we obtain−1θ0MLA(m) =N1 cos(θn − θ0ML )N(2.185)n=1where we have substituted for the maximum likelihood solution for θ0ML (recallingthat we are performing a joint optimization over θ and m), and we have definedA(m) =I1 (m).I0 (m)(2.186)The function A(m) is plotted in Figure 2.20.
Making use of the trigonometric identity (2.178), we can write (2.185) in the formNN1 1 MLA(mML ) =cos θn cos θ0 −sin θn sin θ0ML .(2.187)NNn=1n=11102. PROBABILITY DISTRIBUTIONSFigure 2.21 Plots of the ‘old faithful’ data in which the blue curvesshow contours of constant probability density.On the left is asingle Gaussian distribution whichhas been fitted to the data using maximum likelihood. Note thatthis distribution fails to capture thetwo clumps in the data and indeedplaces much of its probability massin the central region between theclumps where the data are relativelysparse. On the right the distributionis given by a linear combination oftwo Gaussians which has been fittedto the data by maximum likelihoodusing techniques discussed Chapter 9, and which gives a better representation of the data.100100808060604012345640123456The right-hand side of (2.187) is easily evaluated, and the function A(m) can beinverted numerically.For completeness, we mention briefly some alternative techniques for the construction of periodic distributions.