Bayesian Estimation (779797), страница 4
Текст из файла (страница 4)
From Equation (4.56), we notethat the ML solution is an unbiased estimate 1 N −1ˆE[θ ML ] = E ∑ [θ + n(m)] = θ N m =0(4.59)and the variance of the ML estimate is given by2 1 N −1 σ n22ˆˆVar[θ ML ] = E[(θ ML − θ ) ] = E ∑ y ( m ) − θ = N m=0 N(4.60)Note that the variance of the ML estimate decreases with increasing lengthof observation.Example 4.7 Estimation of a uniformly-distributed parameter observed inAWGN.
Consider the effects of using a uniform parameter prior on the meanand the variance of the estimate in Example 4.6. Assume that the prior forthe parameter θ is given by1 /(θ max − θ min )f Θ (θ ) = 0θ min ≤θ ≤θ maxotherwise(4.61)as illustrated in Figure 4.11. From Bayes’ rule, the posterior pdf is given by111Bayesian EstimationfY |Θ ( y | θ)LikelihoodPosteriorPriorθθ MLf Θ|Y (θ | y )f Θ (θ)θ minθ maxθ MAP θ MMSEθθFigure 4.11 Illustration of the effects of a uniform prior.fΘ |Y (θ | y ) =1f Y |Θ ( y | θ ) fΘ (θ )fY ( y ) 1 11exp−2= f Y ( y ) (2πσ n2 ) N / 2 2σ n0,N −1m =0∑ [y(m) − θ ]2 ,θ min ≤ θ ≤θ maxotherwise(4.62)The MAP estimate is obtained by maximising the posterior pdf:θ minθˆMAP ( y ) = θˆML ( y )θ maxif θˆML ( y ) < θ minif θ≥ θˆ ( y ) ≥θminMLmax(4.63)if θˆML ( y ) > θ maxNote that the MAP estimate is constrained to the range θmin to θmax.
Thisconstraint is desirable and moderates the estimates that, due to say lowsignal-to-noise ratio, fall outside the range of possible values of θ. It is easyto see that the variance of an estimate constrained to a range of θmin to θmaxis less than the variance of the ML estimate in which there is no constrainton the range of the parameter estimate:θ maxVar[θˆMAP ]=∞22∫ (θˆMAP − θ ) fY|, ( y |θ dy ≤ Var[θˆML ]= ∫ (θˆML − θ ) fY|, ( y |θ ) dyθ min-∞(4.64)112Bayesian EstimationfΘ |Y (θ | y ) fY ( y )fΘ (θ )fY |Θ ( y | θ )PosteriorPriorLikelihood×θML=θ max=µθθθ MAPθθFigure 4.12 Illustration of the posterior pdf as product of the likelihood and the prior.Example 4.8 Estimation of a Gaussian-distributed parameter observed inAWGN. In this example, we consider the effect of a Gaussian prior on themean and the variance of the MAP estimate.
Assume that the parameter θ isGaussian-distributed with a mean µθ and a variance σ θ2 asf Θ (θ ) =1( 2πσ θ2 )1 / 2 (θ − µθ ) 2 exp −2σ θ2 (4.65)From Bayes rule the posterior pdf is given as the product of the likelihoodand the prior pdfs as:f Θ|Y (θ | y ) =1f Y |Θ ( y | θ ) f Θ (θ )fY ( y)= 111exp −2/221/2Nf Y ( y ) (2πσ n ) 2σ n2( 2πσ θ )N −11∑ [y(m) − θ ]2 − 2σ 2 (θ − µθ ) 2 m=0θ(4.66)The maximum posterior solution is obtained by setting the derivative of thelog-posterior function, ln f Θ| Y (θ | y ), with respect to θ to zero:θˆMAP (y) =where y =σ θ2σθ2σ 2n Ny+µ+ σ n2 Nσ 2θ + σ 2n N θ(4.67)N −1∑ y ( m) / N .m =0Note that the MAP estimate is an interpolation between the ML estimate yand the mean of the prior pdf µθ, as shown in Figure 4.12.
The expectation113Bayesian EstimationN 2 >> N1N1fY |Θ ( y | θ )fY |Θ ( y | θ )fΘ (θ )µθ θMAP θMLfΘ (θ )µ θ θ MAPθθ MLFigure 4.13 Illustration of the effect of increasing length of observation on thevariance an estimator.θof the MAP estimate is obtained by noting that the only random variable onthe right-hand side of Equation (4.67) is the term y , and that E [ y ]=θE [θˆMAP ( y )] =σ θ2σ θ2 + σ n2 Nθ+σ n2 Nσ θ2 +σ n2 Nµθ(4.68)and the variance of the MAP estimate is given asVar[θˆMAP ( y )] =σ θ2σ θ2 + σ n2 N× Var[ y ] =σ n2 N1 + σ n2 Nσ θ2(4.69)Substitution of Equation (4.58) in Equation (4.67) yieldsVar[θˆMAP ( y )] =Var[θˆML ( y )]1 + Var[θˆ ( y )] σ 2ML(4.70)θNote that as σ θ2 , the variance of the parameter θ, increases the influence ofthe prior decreases, and the variance of the MAP estimate tends towards thevariance of the ML estimate.4.2.7 The Relative Importance of the Prior and the ObservationA fundamental issue in the Bayesian inference method is the relativeinfluence of the observation signal and the prior pdf on the outcome.
Theimportance of the observation depends on the confidence in the observation,and the confidence in turn depends on the length of the observation and on114Bayesian Estimationthe signal-to-noise ratio (SNR). In general, as the number of observationsamples and the SNR increase, the variance of the estimate and the influenceof the prior decrease. From Equation (4.67) for the estimation of a Gaussiandistributed parameter observed in AWGN, as the length of the observation Nincreases, the importance of the prior decreases, and the MAP estimate tendsto the ML estimate:σ2σ2 Nµθ = y = θˆMLy+ 2 n 2limitθˆMAP ( y ) = limit 2 θ 2N →∞N →∞ σ + σσθ +σ n Nn Nθ(4.71)As illustrated in Figure 4.13, as the length of the observation N tends to infinitythen both the MAP and the ML estimates of the parameter should tend to its truevalue θ.Example 4.9 MAP estimation of a signal in additive noise.
Consider theestimation of a scalar-valued Gaussian signal x(m), observed in an additiveGaussian white noise n(m), and modelled asy ( m ) = x ( m )+ n ( m )(4.72)The posterior pdf of the signal x(m) is given byf X |Y ( x ( m ) y ( m ) ) =1f Y | X ( y ( m ) x ( m ) ) f X ( x ( m ))fY ( y ( m) )(4.73)1f N ( y ( m ) − x ( m ) ) f X ( x ( m ))=fY ( y ( m) )()()where f X ( x(m))=N x(m),µ x ,σ x2 and f N (n(m))=N n(m),µ n ,σ n2 are theGaussian pdfs of the signal and noise respectively. Substitution of the signaland noise pdfs in Equation (4.73) yields21 [ y ( m ) − x ( m ) − µ n ] exp −2π σ n2σ n2 [x ( m ) − µ x ]2 1×exp −2π σ x2σ x2(4.74)This equation can be rewritten asf X |Y ( x ( m ) | y ( m ) ) =1fY ( y ( m ) )115Bayesian Estimationf X |Y ( x ( m ) | y ( m ) ) = σ 2 [ y ( m ) − x ( m ) − µ n ]2 + σ n2 [ x ( m ) − µ x ]211exp − xf Y ( y ( m ) ) 2πσ n σ x2σ x2σ n2(4.75)To obtain the MAP estimate we set the derivative of the log-likelihoodfunction ln f X |Y (x (m) | y ( m) ) with respect to x(m) to zero as∂ [ln f X |Y (x(m) | y (m) )]∂xˆ (m)=−−2σ x2 ( y (m) − x(m) − µ n ) + 2σ n2 ( x(m) − µ x )2σ x2σ n2=0(4.76)From Equation (4.76) the MAP signal estimate is given byxˆ ( m ) = x2[ y ( m ) − n ] +2 x2 + n n2 x2 + n2x(4.77)Note that the estimate xˆ (m) is a weighted linear interpolation between theunconditional mean of x(m), µx, and the observed value (y(m)–µn).
At a verypoor SNR i.e. when 1 x2 << 1 n2 we have xˆ (m) ≈ µ x ; and, on the other hand,for a noise-free signal 1 n2 = 0 and µ n = 0 and we have xˆ (m) = y (m) .Example 4.10 MAP estimate of a Gaussian–AR process observed inAWGN. Consider a vector of N samples x from an autoregressive (AR)process observed in an additive Gaussian noise, and modelled asy = x+n(4.78)From Chapter 8, a vector x from an AR process may be expressed ase= Ax(4.79)where A is a matrix of the AR model coefficients, and the vector e is theinput signal of the AR model. Assuming that the signal x is Gaussian, andthat the P initial samples x0 are known, the pdf of the signal x is given by116Bayesian Estimation1f X ( x | x 0 ) = f E (e )=(2π σ e2 )N / 21T Texp −xAAx2σ2e(4.80)where it is assumed that the input signal e of the AR model is a zero-meanuncorrelated process with variance σ e2 .
The pdf of a zero-mean Gaussiannoise vector n, with covariance matrix Σnn, is given byf N ( n) =1(2π )N /2Σ nn1/ 21−1 exp − n T Σ nnn2(4.81)From Bayes’ rule, the pdf of the signal given the noisy observation isf X |Y ( x | y ) =fY |X ( y x ) f X ( x )fY ( y )=1f N ( y − x) f X ( x)fY ( y )(4.82)Substitution of the pdfs of the signal and noise in Equation (4.82) yieldsf X |Y ( x | y ) =1f Y ( y )(2π ) N σ eN / 2 Σ nn1/ 2 1 x T Α T Ax −1(y − x) +exp − ( y − x )T Σ nn 2 σ e2 (4.83)The MAP estimate corresponds to the minimum of the argument of theexponential function in Equation (4.83).
Assuming that the argument of theexponential function is differentiable, and has a well-defined minimum, wecan obtain the MAP estimate from∂ x T Α T Ax −1xˆ MAP ( y ) = arg zero ( y − x ) T Σ nn( y − x)+σ e2x ∂x (4.84)The MAP estimate is1xˆ MAP ( y ) = I + 2 Σ nn Α T A ewhere I is the identity matrix.−1y(4.85)117Estimate–Maximise (EM) Method4.3 The Estimate–Maximise (EM) MethodThe EM algorithm is an iterative likelihood maximisation method withapplications in blind deconvolution, model-based signal interpolation,spectral estimation from noisy observations, estimation of a set of modelparameters from a training data set, etc.
The EM is a framework for solvingproblems where it is difficult to obtain a direct ML estimate either becausethe data is incomplete or because the problem is difficult.To define the term incomplete data, consider a signal x from a randomprocess X with an unknown parameter vector θ and a pdf fX;Θ(x;θ). Thenotation fX;Θ(x;θ) expresses the dependence of the pdf of X on the value ofthe unknown parameter θ. The signal x is the so-called complete data andthe ML estimate of the parameter vector θ may be obtained from fX;Θ(x;θ).Now assume that the signal x goes through a many-to-one non-invertibletransformation (e.g.















