Bayesian Estimation (779797), страница 7
Текст из файла (страница 7)
In the binaryclassification of a scalar-valued variable x, the probability of classificationerror is given byBayesian Estimation132P(Error x ) = P(C1 )P( x > Thrsh | x ∈ C1 ) + P(C 2 ) P ( x > Thrsh | x ∈ C 2 )(4.133)For two Gaussian-distributed classes of scalar-valued signals with pdfsN ( x(m),µ1 , σ 12 ) and N ( x(m),µ 2 , σ 22 ) , Equation (4.133) becomesP(Error x ) = P(C1 )∞ (x − µ1 )2 dxexp −22π σ 12σ 1 1∫Thrsh(4.134)+ P(C 2 )Thrsh∫−∞ (x − µ 2 )2 − dx2π σ 2 2σ 22 1where the parameter Thrsh is the classification threshold.4.6.3 Bayesian Classification of Discrete-Valued ParametersLet the set Θ={θi, i =1, ..., M} denote the values that a discrete Pdimensional parameter vector θ can assume. In general, the observationspace Y associated with a discrete parameter space Θ may be a discretevalued or a continuous-valued space.
Assuming that the observation space iscontinuous, the pdf of the parameter vector θi, given observation vector y ,may be expressed, using Bayes’ rule, asPΘ |Y (θ i | y ) =f Y |Θ ( y |θ i ) PΘ (θ i )fY ( y)(4.135)For the case when the observation space Y is discrete-valued, the probabilitydensity functions are replaced by the appropriate probability mass functions.The Bayesian risk in selecting the parameter vector θi given the observationy is defined asMR (θ i | y ) = ∑ C (θ i | θ j ) PΘ |Y (θ j | y )(4.136)j =1where C(θi|θj) is the cost of selecting the parameter θi when the trueparameter is θj. The Bayesian classification Equation (4.136) can beBayesian Classification133employed to obtain the maximum a posteriori, the maximum likelihood andthe minimum mean square error classifiers.4.6.4 Maximum A Posteriori ClassificationMAP classification corresponds to Bayesian classification with a uniformcost function defined asC (θ i | θ j ) = 1 − δ (θ i , θ j )(4.137)where δ(· ) is the delta function.
Substitution of this cost function in theBayesian risk function yieldsMR MAP (θ i | y ) = ∑ [1 − δ (θ i ,θ j )] PΘ | y (θ j | y )j =1(4.138)= 1 − PΘ | y (θ i | y )Note that the MAP risk in selecting θi is the classification error probability;that is the sum of the probabilities of all other candidates. From Equation(4.138) minimisation of the MAP risk function is achieved by maximisationof the posterior pmf:θˆMAP ( y ) = arg max PΘ |Y (θ i | y )θi= arg max PΘ (θ i ) f Y |Θ ( y | θ i )(4.139)θi4.6.5 Maximum-Likelihood (ML) ClassificationThe ML classification corresponds to Bayesian classification when theparameter θ has a uniform prior pmf and the cost function is also uniform:MR ML (θ i | y ) = ∑ [1 − δ (θ i ,θ j )]j =11f Y |Θ ( y | θ j )PΘ (θ j )fY ( y)1f Y |θ ( y | θ i )PΘ=1−fY ( y)(4.140)Bayesian Estimation134where PΘ is the uniform pmf of θ.
Minimisation of the ML risk function(4.140) is equivalent to maximisation of the likelihood f Y|Θ (y| θ i )θˆML ( y ) = arg max fY |Θ ( y | θ i )(4.141)θi4.6.6 Minimum Mean Square Error ClassificationThe Bayesian minimum mean square error classification results fromminimisation of the following risk function:R MMSE (θ i | y ) =M∑ θi −θ jj =12PΘ |Y (θ j | y )(4.142)For the case when PΘ |Y (θ j | y) is not available, the MMSE classifier isgiven byθˆMMSE ( y ) = arg min θ i − θ ( y ) 2(4.143)θiwhere θ(y) is an estimate based on the observation y.4.6.7 Bayesian Classification of Finite State ProcessesIn this section, the classification problem is formulated within theframework of a finite state random process. A finite state process iscomposed of a probabilistic chain of a number of different randomprocesses. Finite state processes are used for modelling non-stationarysignals such as speech, image, background acoustic noise, and impulsivenoise as discussed in Chapter 5.Consider a process with a set of M states denoted as S={s1, s2, .
. ., sM},where each state has some distinct statistical property. In its simplest form, astate is just a single vector, and the finite state process is equivalent to adiscrete-valued random process with M outcomes. In this case the Bayesianstate estimation is identical to the Bayesian classification of a signal intoone of M discrete-valued vectors.
More generally, a state generatescontinuous-valued, or discrete-valued vectors from a pdf, or a pmf,associated with the state. Figure 4.18 illustrates an M-state process, wherethe output of the ith state is expressed asBayesian Classification135x (m) = hi (θ i , e (m) ) , i= 1, . . ., M(4.144)where in each state the signal x(m) is modelled as the output of a statedependent function hi(·) with parameter θi, input e(m) and an input pdffEi(e(m)).
The prior probability of each state is given byMP ( si ) = E[ N ( si )] E ∑ N ( s j )S j =1(4.145)where E[N(si)] is the expected number of observation from state si. The pdfof the output of a finite state process is a weighted combination of the pdf ofeach state and is given byMf X (x ( m) )= ∑ PS ( si ) f X |S ( x | si )(4.146)i =1In Figure 4.18, the noisy observation y(m) is the sum of the process outputx(m) and an additive noise n(m).
From Bayes’ rule, the posterior probabilityof the state si given the observation y(m) can be expressed asx = h1 (θ, e)e ∈ f1 (e)x = h2 (θ, e)e ∈ f2 (e)...x = hM (θ, e)e ∈ fM (e)STATEswitchxNoisen+yFigure 4.18 Illustration of a random process generated by a finite state system.Bayesian Estimation136PS |Y (si y (m) ) =f Y |S ( y(m) si )PS ( si )M∑ fY |S (y(m) s j )PS (s j )(4.147)j =1In MAP classification, the state with the maximum posterior probability isselected ass MAP ( y (m) ) = arg max PS|Y (si y (m) )(4.148)siThe Bayesian state classifier assigns a misclassification cost functionC(si|sj) to the action of selecting the state si when the true state is sj. The riskfunction for the Bayesian classification is given byMR (si y (m)) = ∑ C ( si |s j ) PS |Y ( s j | y (m))(4.149)j =14.6.8 Bayesian Estimation of the Most Likely State SequenceConsider the estimation of the most likely state sequences = [s i0 ,si1 ,,si T −1 ] of a finite state process, given a sequence of Tobservation vectors Y = [y0 , y1 ,, yT −1 ].
A state sequence s, of length T, isitself a random integer-valued vector process with NT possible values. Fromthe Bayes rule, the posterior pmf of a state sequence s, given an observationsequence Y, can be expressed asPS |Y ( si0 , , siT −1 | y 0 , , yT −1 ) =f Y |S ( y 0 , , yT −1 | si0 , , siT −1 ) PS ( si0 , , siT −1 )f Y ( y 0 , , yT −1 )(4.150)where PS(s) is the pmf of the state sequence s, and for a given observationsequence, the denominator f Y (y0 ,, yT −1 ) is a constant. The Bayesian riskin selecting a state sequence si is expressed asBayesian Classification137a 00S0a 10a 02a 01a 20a 12S2S1a 21a11a 22Figure 4.19 A three state Markov Process.NTR ( si y ) = ∑ C (si | s j ) PS|Y ( s j y )(4.151)j =1For a statistically independent process, the state of the process at any time isindependent of the previous states, and hence the conditional probability ofa state sequence can be written asT −1PS |Y ( si0 , , siT −1 y 0 , , yT −1 ) = ∏ f Y |S ( y k sik ) PS ( sik )(4.152)k =0where sik denotes state si at time instant k.
A particular case of a finite stateprocess is the Markov chain where the state transition is governed by aMarkovian process such that the probability of the state i at time m dependson the state of the process at time m-1. The conditional pmf of a Markovstate sequence can be expressed as, sPS |Y ( si0 ,iT −1, y| y0 ,T −1 ) =T −1∏ aik =0ik −1 kf S |Y ( sik | y k )(4.153)where aik −1ik is the probability that the process moves from state s ik −1 tostate s ik Finite state random processes and computationally efficientmethods of state sequence estimation are described in detail in Chapter 5.Bayesian Estimation1384.7 Modelling the Space of a Random ProcessIn this section, we consider the training of statistical models for a databaseof P-dimensional vectors of a random process.
The vectors in the databasecan be visualised as forming a number of clusters or regions in a Pdimensional space. The statistical modelling method consists of two steps:(a) the partitioning of the database into a number of regions, or clusters, and(b) the estimation of the parameters of a statistical model for each cluster. Asimple method for modelling the space of a random signal is to use a set ofprototype vectors that represent the centroids of the signal space. Thismethod effectively quantises the space of a random process into a relativelysmall number of typical vectors, and is known as vector quantisation (VQ).In the following, we first consider a VQ model of a random process, andthen extend this model to a pdf model, based on a mixture of Gaussiandensities.4.7.1 Vector Quantisation of a Random ProcessIn vector quantisation, the space of a random vector process X is partitionedinto K clusters or regions [X1, X2, ...,XK], and each cluster Xi is representedby a cluster centroid ci.
The set of centroid vectors [c1, c2, ...,cK] form a VQcode book model of the process X. The VQ code book can then be used toclassify an unlabelled vector x with the nearest centroid. The codebook issearched to find the centroid vector with the minimum distance from x, thenx is labelled with the index of the minimum distance centroid asLabel ( x ) = arg min d ( x , c i )(4.154)iwhere d(x, ci) is a measure of distance between the vectors x and ci. Themost commonly used distance measure is the mean squared distance.4.7.2 Design of a Vector Quantiser: K-Means ClusteringThe K-means algorithm, illustrated in Figure 4.20, is an iterative method forthe design of a VQ codebook. Each iteration consists of two basic steps : (a)Partition the training signal space into K regions or clusters and (b) computethe centroid of each region.















