Bayesian Estimation (779797), страница 3
Текст из файла (страница 3)
The ML estimatorcorresponds to a Bayesian estimator with a uniform cost function and auniform parameter prior pdf:R ML (θˆ | y ) = ∫ [1 − δ (θˆ ,θ )] f Y |Θ ( y |θ ) fΘ (θ ) dθθ= const.[1 − f Y |Θ ( y |θˆ)](4.26)where the prior function fΘ(θ)=const. From a Bayesian point of view themain difference between the ML and MAP estimators is that the MLassumes that the prior pdf of θ is uniform. Note that a uniform prior, inaddition to modelling genuinely uniform pdfs, is also used when theparameter prior pdf is unknown, or when the parameter is an unknownconstant.From Equation (4.26), it is evident that minimisation of the riskfunction is achieved by maximisation of the likelihood function:θˆML = arg max f Y |Θ ( y |θ )(4.27)θIn practice it is convenient to maximise the log-likelihood function insteadof the likelihood:(4.28)θ ML = arg max log fY |θ (Y | θ )θThe log-likelihood is usually chosen in practice because:(a) the logarithm is a monotonic function, and hence the log-likelihoodhas the same turning points as the likelihood function;(b) the joint log-likelihood of a set of independent variables is the sumof the log-likelihood of individual elements; and(c) unlike the likelihood function, the log-likelihood has a dynamicrange that does not cause computational under-flow.Example 4.3 ML Estimation of the mean and variance of a Gaussianprocess Consider the problem of maximum likelihood estimation of themean vector µ y and the covariance matrix Σ yy of a P-dimensional103Bayesian EstimationGaussian vector process from N observation vectors [ y(0), y(1),, y(N − 1)].Assuming the observation vectors are uncorrelated, the pdf of theobservation sequence is given byN −1f Y ( y (0) , , y ( N − 1) ) = ∏m=01(2π )P / 2 Σ yy1/ 2 1−1[ y ( m) − µ y ] exp − [ y ( m ) − µ y ]T Σ yy 2(4.29)and the log-likelihood equation is given byln f Y ( y (0) , , y ( N − 1) ) =N −1∑ − 2 ln(2π ) − 2 ln Σ yy − 2 [ y ( m) − µ y ] T Σ yy−1 [ y ( m) − µ y ] P11m= 0(4.30)Taking the derivative of the log-likelihood equation with respect to themean vector µ y yields∂ ln f Y ( y (0),, y ( N − 1) )=∂µyN −1∑ [2Σ yy−1 µ y −2 Σ yy−1 y ( m ) ] = 0(4.31)1 N −1∑ y ( m)N m =0(4.32)m =0From Equation (4.31), we haveµˆ y =To obtain the ML estimate of the covariance matrix we take the derivativeof the log-likelihood equation with respect to Σ −1yy :∂ ln f Y ( y (0),, y ( N − 1) )−1∂ Σ yyN −111= ∑ Σ yy − [ y ( m ) − µ y ][ y ( m ) − µ y ]T = 02m=0 2(4.33)From Equation (4.31), we have an estimate of the covariance matrix as1 N −1Σˆ yy = ∑ [ y ( m ) − µˆ y ][ y ( m ) − µˆ y ]TN m=0(4.34)104Bayesian EstimationExample 4.4 ML and MAP Estimation of a Gaussian Random Parameter.Consider the estimation of a P-dimensional random parameter vector θ froman N-dimensional observation vector y.
Assume that the relation betweenthe signal vector y and the parameter vector θ is described by a linear modelas(4.35)y = Gθ + ewhere e is a random excitation input signal. The pdf of the parameter vectorθ given an observation vector y can be described, using Bayes’ rule, asfΘ |Y (θ | y ) =1f Y |Θ ( y | θ ) fΘ (θ )fY ( y )(4.36)Assuming that the matrix G in Equation (4.35) is known, the likelihood ofthe signal y given the parameter vector θ is the pdf of the random vector e:f Y |Θ ( y| θ ) = f E (e = y − Gθ )(4.37)Now assume the input e is a zero-mean, Gaussian-distributed, randomprocess with a diagonal covariance matrix, and the parameter vector θ isalso a Gaussian process with mean of µθ and covariance matrix Σθθ.Therefore we havef Y |Θ ( y | θ ) = f E (e ) =1(2πσ e2 ) N / 2 1exp − 2 ( y − Gθ ) T ( y − Gθ ) (4.38) 2σ eandfΘ (θ ) =1(2π )P/2Σ θθ1/ 21−1exp − (θ − µθ ) T Σ θθ(θ − µθ ) 2(4.39)The ML estimate obtained from maximisation of the log-likelihood functionln[ f Y |Θ ( y | θ )] with respect to θ is given by(θˆML ( y ) = G T G)−1G T y(4.40)To obtain the MAP estimate we first form the posterior distribution bysubstituting Equations (4.38) and (4.39) in Equation (4.36)105Bayesian EstimationfΘ |Y (θ | y ) =111N2/2f Y ( y ) (2πσ e )(2π ) P / 2 Σ θθ1/ 211−1× exp − 2 ( y − Gθ ) T ( y − Gθ ) − (θ − µθ ) T Σ θθ(θ − µθ ) 2 2σ e(4.41)The MAP parameter estimate is obtained by differentiating the loglikelihood function ln fΘ |Y (θ | y ) and setting the derivative to zero:(−1θˆMAP ( y )= G T G + σ e2 Σ θθ)−1 (G T y + σ e2 Σ θθ−1 µθ )(4.42)Note that as the covariance of the Gaussian-distributed parameter increases,−1or equivalently as Σ θθ → 0 , the Gaussian prior tends to a uniform prior andthe MAP solution Equation (4.42) tends to the ML solution given byEquation (4.40).
Conversely as the pdf of the parameter vector θ becomespeaked, i.e. as Σ θθ → 0 , the estimate tends towards µθ.4.2.3 Minimum Mean Square Error EstimationThe Bayesian minimum mean square error (MMSE) estimate is obtained asthe parameter vector that minimises a mean square error cost function(Figure 4.8) defined asR MMSE (θˆ | y ) = E [(θˆ − θ ) 2 | y ]= ∫ (θˆ − θ ) 2 fθ |Y (θ | y ) dθ(4.43)θIn the following, it is shown that the Bayesian MMSE estimate is theconditional mean of the posterior pdf. Assuming that the mean square errorrisk function is differentiable and has a well-defined minimum, the MMSEsolution can be obtained by setting the gradient of the mean square error riskfunction to zero:∂R MMSE (θˆ y )= 2θˆ ∫ fΘ |Y (θ | y ) dθ − 2∫ θ fΘ |Y (θ | y ) dθˆ∂θθθ(4.44)106Bayesian EstimationfΘ |Y (θ | y )C (θˆ ,θ )θˆMMSEθFigure 4.8 Illustration of the mean square error cost function and estimate.Since the first integral on the right hand-side of Equation (4.42) is equal to1, we have∂RMMSE (θˆ | y )= 2θˆ − ∫ θ fΘ |Y ( dθ | y ) dθθˆ∂θ(4.45)The MMSE solution is obtained by setting Equation (4.45) to zero:θˆMMSE ( y ) = ∫ θ fΘ |Y (θ | y ) dθ(4.46)θFor cases where we do not have a pdf model of the parameter process, theminimum mean square error (known as the least square error, LSE) estimateis obtained through minimisation of a mean square error functionE [e 2 (θ | y)]:θˆ(4.47)= arg min E [e 2 (θ | y )]LSEθTh LSE estimation of Equation (4.47) does not use any prior knowledge ofthe distribution of the signals and the parameters.
This can be considered asa strength of LSE in situations where the prior pdfs are unknown, but it canalso be considered as a weakness in cases where fairly accurate models ofthe priors are available but not utilised.107Bayesian EstimationExample 4.5 Consider the MMSE estimation of a parameter vector θassuming a linear model of the observation y asy = Gθ + e(4.48)The LSE estimate is obtained as the parameter vector at which the gradientof the mean squared error with respect to θ is zero:∂e T e∂( y T y − 2θ T G T y + θ=∂θ∂θTG T Gθ )θ LSE=0(4.49)From Equation (4.49) the LSE parameter estimate is given byθ LSE = [G T G ] −1 G T y(4.50)Note that for a Gaussian likelihood function, the LSE solution is the same asthe ML solution of Equation (4.40).4.2.4 Minimum Mean Absolute Value of Error EstimationThe minimum mean absolute value of error (MAVE) estimate (Figure 4.9)is obtained through minimisation of a Bayesian risk function defined asR MAVE (θˆ | y ) = E[| θˆ − θ | y ] = ∫ | θˆ − θ | fθ |Y (θ | y ) dθ(4.51)θIn the following it is shown that the minimum mean absolute value estimateis the median of the parameter process.
Equation (4.51) can be re-expressedasθˆ∞−∞θR MAVE (θˆ | y ) = ∫ [θˆ − θ ] fΘ |Y (θ | y ) dθ + ∫ ˆ [θ − θˆ ] fΘ |Y (θ | y ) dθ(4.52)Taking the derivative of the risk function with respect to θˆ yields∂R MAVE (θˆ | y )∂θˆ=∫θˆ−∞∞fΘ |Y (θ | y ) dθ − ∫ ˆ fΘ |Y (θ | y ) dθθ(4.53)108Bayesian EstimationfΘ |Y (θ | y )C (θˆ ,θ )θθˆMAVEFigure 4.9 Illustration of mean absolute value of error cost function. Note that theMAVE estimate coincides with the conditional median of the posterior function.The minimum absolute value of error is obtained by setting Equation (4.53)to zero:θˆ MAVE∫−∞∞fΘ |Y (θ | y ) dθ = ∫ ˆθ MAVEfΘ |Y (θ | y ) dθ(4.54)From Equation (4.54) we note the MAVE estimate is the median of theposterior density.4.2.5 Equivalence of the MAP, ML, MMSE and MAVE forGaussian Processes With Uniform Distributed ParametersExample 4.4 shows that for a Gaussian-distributed process the LSE estimateand the ML estimate are identical.
Furthermore, Equation (4.42), for theMAP estimate of a Gaussian-distributed parameter, shows that as theparameter variance increases, or equivalently as the parameter prior pdftends to a uniform distribution, the MAP estimate tends to the ML and LSEestimates. In general, for any symmetric distribution, centred round themaximum, the mode, the mean and the median are identical. Hence, for aprocess with a symmetric pdf, if the prior distribution of the parameter isuniform then the MAP, the ML, the MMSE and the MAVE parameterestimates are identical. Figure 4.10 illustrates a symmetric pdf, anasymmetric pdf, and the relative positions of various estimates.109Bayesian EstimationfY|Θ ( y | θ )fY |Θ ( y | θ )MAPMLMMSEMAVEMAPMAVEMMSEmean, mode,medianθmode median meanθFigure 4.10 Illustration of a symmetric and an asymmetric pdf and their respectivemode, mean and median and the relations to MAP, MAVE and MMSE estimates.4.2.6 The Influence of the Prior on Estimation Bias and VarianceThe use of a prior pdf introduces a bias in the estimate towards the range ofparameter values with a relatively high prior pdf, and reduces the varianceof the estimate.
To illustrate the effects of the prior pdf on the bias and thevariance of an estimate, we consider the following examples in which thebias and the variance of the ML and the MAP estimates of the mean of aprocess are compared.Example 4.6 Consider the ML estimation of a random scalar parameter θ,observed in a zero-mean additive white Gaussian noise (AWGN) n(m), andexpressed asy(m) = θ + n(m), m= 0,..., N–1(4.55)It is assumed that, for each realisation of the parameter θ, N observationsamples are available. Note that, since the noise is assumed to be a zeromean process, this problem is equivalent to estimation of the mean of theprocess y(m). The likelihood of an observation vector y=[y(0), y(1), …,y(N–1)] and a parameter value of θ is given byN −1fY |Θ ( y | θ ) = ∏ f N ( y ( m ) − θ )m=0=1( 2πσ n2 ) N / 21exp −2 2σ nN −1∑ [ y ( m ) − θ ]2 m =0(4.56)110Bayesian EstimationFrom Equation (4.56) the log-likelihood function is given byln fY |Θ ( y | θ ) = −N1 N −1ln(2πσ n2 ) − 2 ∑ [ y (m) − θ ]222σ n m=0(4.57)The ML estimate of θ , obtained by setting the derivative of ln f Y|Θ ( y θ ) tozero, is given by1 N −1θˆML = ∑ y (m) = y(4.58)N m =0where y denotes the time average of y(m).















