Bayesian Estimation (779797), страница 6
Текст из файла (страница 6)
Now the joint pdf of y(m) and the kthGaussian component of the mixture density can be written as()(f Y ,K Θ y ( m ), k θˆi = Pk f k y ( m ) θˆki(i)= Pk N k y ( m ); µˆ k , Σˆ kii)i(4.118)where N k (y (m); µˆ k ,Σˆ k ) is a Gaussian density with mean vector µ k andcovariance matrix Σ k :1exp − ( y(m) − µ k )T Σ k−1 ( y(m) − µ k )2Σk(2 π )(4.119)The pdf of y(m) as a mixture of K Gaussian densities is given byN k ( y(m); µ k , Σ k ) =11/ 2P /2()(f Y θ y (m) θˆi = N y (m) θˆiK)(= ∑ Pˆk N k y (m); µˆ k , Σˆ kk =1iii)(4.120)Substitution of the Gaussian densities of Equation (4.118) and Equation(4.120) in Equation (4.117) yieldsPˆk N k (y(m); µˆ k , Σˆ k )∑ i N (y(m) Θˆi ) i ln[ Pk N k ( y(m); µ k , Σ k )]N −1 KU [(µ , Σ , P),( µˆ i , Σˆ i , Pˆ i )] = ∑m=0 k =1i Pˆ k N k ( y( m); µˆ k , Σˆ k )Pˆki N k (y(m ); µˆ ki , Σˆ ki )iiilnln N k (yk ; µ k , Σ k )+P∑∑kˆˆN (y (m) Θi )N (y (m) Θ i )m=0 k =1 N −1 K=(4.121)Equation (4.121) is maximised with respect to the parameter Pk using theconstrained optimisation method.
This involves subtracting the constantterm ΣPk=1 from the right hand side of Equation (4.121) and then settingthe derivative of this equation with respect to Pk to zero, this yields127Design of Mixture Gaussian Models= arg max U[(µ , Σ , P ),( µˆ i , Σˆ i , Pˆi )]Pˆ ki+1Pkˆ1 N −1 Pˆki N k ( y(m); µˆ ki , Σ ki )=∑N m=0N (y(m) Θˆ i )(4.122)The parameters µk and Σk that maximise the function U are obtained, bysetting the derivative of the function with respect to these parameters tozero:µˆ ki+1= arg max U[( µ , Σ , P ),( µˆ i , Σˆ i , Pˆi )]µkN −1∑=Pˆk N k ( y (m); µˆ k , Σˆ k )iiiy(m)ˆN (y (m) Θ )m=0N −1∑m=0i(4.123)Pˆki N k ( y(m); µˆ ki , Σˆ ki )N (y(m) Θˆ i )andΣˆ ki+1= arg max U[(µ , Σ , P),(µˆ i , Σˆ i , Pˆi )]ΣkN −1∑=m=0Pˆk N k (y(m); µˆ k , Σˆ k )iii(y( m) − µˆ k )( y(m) − µˆ k ) TˆiiN (y (m ) Θ )iN −1∑m=0Pˆ ki N k ( y(m); µˆ ki , Σˆ ki )N (y(m) Θˆ )i(4.124)Equations (4.122)–(4.124) are the estimates of the parameters of a mixtureGaussian pdf model.
These equations can be used in further iterations of theEM method until the parameter estimates converge.4.6 Bayesian ClassificationClassification is the processing and labelling of an observation sequence{y(m)} with one of M classes of signals {Ck; k=1, ..., M} that could havegenerated the observation. Classifiers are present in all modern digitalcommunication systems and in applications such as the decoding ofBayesian Estimation128fX(x)C1C2µ1θthrsh µ2xFigure 4.16 – Illustration of the overlap of the distribution of two classes of signals.discrete-valued symbols in digital communication receivers, speechcompression, video compression, speech recognition, image recognition,character recognition, signal/noise classification and detectors. For example,in an M-symbol digital communication system, the channel output signal isclassified as one of the M signalling symbols; in speech recognition,segments of speech signals are labelled with one of about 40 elementaryphonemes sounds; and in speech or video compression, a segment of speechsamples or a block of image pixels are quantised and labelled with one of anumber of prototype signal vectors in a codebook.
In the design of aclassifier, the aim is to reduce the classification error given the constraintson the signal-to-noise ratio, the bandwidth and the computational resources.Classification errors are due to overlap of the distributions of differentclasses of signals. This is illustrated in Figure 4.16 for a binary classificationproblem with two Gaussian distributed signal classes C1 and C2. In theshaded region, where the signal distributions overlap, a sample x couldbelong to either of the two classes.
The shaded area gives a measure of theclassification error. The obvious solution suggested by Figure 4.16 forreducing the classification error is to reduce the overlap of the distributions.The overlap can be reduced in two ways: (a) by increasing the distancebetween the mean values of different classes, and (b) by reducing thevariance of each class. In telecommunication systems the overlap betweenthe signal classes is reduced using a combination of several methodsincluding increasing the signal-to-noise ratio, increasing the distancebetween signal patterns by adding redundant error control coding bits, andsignal shaping and post-filtering operations.
In pattern recognition, where itis not possible to control the signal generation process (as in speech andBayesian Classification129image recognition), the choice of the pattern features and models affects theclassification error. The design of an efficient classification for patternrecognition depends on a number of factors, which can be listed as follows:(1) Extraction and transformation of a set of discriminative features fromthe signal that can aid the classification process.
The features need toadequately characterise each class and emphasise the differencebetween various classes.(2) Statistical modelling of the observation features for each class. ForBayesian classification, a posterior probability model for each classshould be obtained.(3) Labelling of an unlabelled signal with one of the N classes.4.6.1 Binary ClassificationThe simplest form of classification is the labelling of an observation withone of two classes of signals. Figures 4.17(a) and 4.17(b) illustrate twoexamples of a simple binary classification problem in a two-dimensionalsignal space. In each case, the observation is the result of a random mapping(e.g.
signal plus noise) from the binary source to the continuous observationspace. In Figure 4.17(a), the binary sources and the observation spaceassociated with each source are well separated, and it is possible to make anerror-free classification of each observation. In Figure 4.17(b) there is lessdistance between the mean of the sources, and the observation signals have agreater spread. This results in some overlap of the signal spaces andclassification error can occur. In binary classification, a signal x is labelledwith the class that scores the higher a posterior probability:C1> PC X (C 2 x )PC X (C1 x ) <(4.125)C2Using Bayes’ rule Equation (4.125) can be rewritten asC1> PC (C 2 ) f X C (x C 2 )PC (C1 ) f X C (x C1 ) <(4.126)C2Letting PC(C1)=P1 and PC(C2)=P2, Equation (4.126) is often written interms of a likelihood ratio test asBayesian Estimation130y1s1 Discrete source spaceNoisy observation spacey2s2(a)y1s1s2(b)y2Figure 4.17 Illustration of binary classification: (a) the source and observation spacesare well separated, (b) the observation spaces overlap.f X C ( x C1 )f X C (x C2C1>)<C2P2P1(4.127)Taking the likelihood ratio yields the following discriminant function:C1h( x ) = ln f X C ( x C1 ) − ln f X C ( x C 2 )><C2lnP2P1(4.128)Now assume that the signal in each class has a Gaussian distribution with aprobability distribution function given byf X C ( x ci ) =1exp − ( x − µ i ) T Σ i−1 ( x − µ i ) , i=1,2 22π Σ i1(4.129)Bayesian Classification131From Equations (4.128) and (4.129), thebecomesdiscriminant function h(x)Σ11h( x ) = − ( x − µ1 ) T Σ 1−1 ( x − µ 1 ) + ( x − µ 2 ) T Σ 2−1 ( x − µ 2 ) + ln 2Σ122C1><C2lnP2P1(4.130)Example 4.10 For two Gaussian-distributed classes of scalar-valuedsignals with distributions given by N ( x(m),µ1 , σ 2 ) and N ( x(m),µ 2 , σ 2 ) ,and equal class probability P1=P2=0.5, the discrimination function ofEquation (4.130) becomesh( x(m)) =µ 2 − µ1σ21 µ 22 − µ12x ( m) +2 σ2C1>0<(4.131)C2Hence the rule for signal classification becomesC1x ( m) <>C2µ1 + µ 22(4.132)The signal is labelled with class C1 if x(m)< ( µ1 + µ 2 ) / 2 and as class C2otherwise.4.6.2 Classification ErrorClassification errors are due to the overlap of the distributions of differentclasses of signals.
This is illustrated in Figure 4.16 for the binaryclassification of a scalar-valued signal and in Figure 4.17 for the binaryclassification of a two-dimensional signal. In each figure the overlappedarea gives a measure of classification error. The obvious solution forreducing the classification error is to reduce the overlap of the distributions.This may be achieved by increasing the distance between the mean values ofvarious classes or by reducing the variance of each class.















