Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 26

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 26 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 262020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 26)

In fact, the Bayesianparadigm leads very naturally to a sequential view of the inference problem. To seethis in the context of the inference of the mean of a Gaussian, we write the posteriordistribution with the contribution from the ﬁnal data point xN separated out so thatN−1p(µ|D) ∝ p(µ)p(xn |µ) p(xN |µ).(2.144)n=1The term in square brackets is (up to a normalization coefﬁcient) just the posteriordistribution after observing N − 1 data points. We see that this can be viewed asa prior distribution, which is combined using Bayes’ theorem with the likelihoodfunction associated with data point xN to arrive at the posterior distribution afterobserving N data points. This sequential view of Bayesian inference is very generaland applies to any problem in which the observed data are assumed to be independentand identically distributed.So far, we have assumed that the variance of the Gaussian distribution over thedata is known and our goal is to infer the mean.

Now let us suppose that the meanis known and we wish to infer the variance. Again, our calculations will be greatlysimpliﬁed if we choose a conjugate form for the prior distribution. It turns out to bemost convenient to work with the precision λ ≡ 1/σ 2 . The likelihood function for λtakes the formNNλ−1N/22p(X|λ) =N (xn |µ, λ ) ∝ λexp −(xn − µ) .(2.145)2n=1n=11002. PROBABILITY DISTRIBUTIONS222a = 0.1b = 0.1a=1b=110a=4b=610Figure 2.13a and b.λ1201λ0102λ012Plot of the gamma distribution Gam(λ|a, b) deﬁned by (2.146) for various values of the parametersThe corresponding conjugate prior should therefore be proportional to the productof a power of λ and the exponential of a linear function of λ. This corresponds tothe gamma distribution which is deﬁned byGam(λ|a, b) =Exercise 2.41Exercise 2.421 a a−1exp(−bλ).b λΓ(a)(2.146)Here Γ(a) is the gamma function that is deﬁned by (1.141) and that ensures that(2.146) is correctly normalized.

The gamma distribution has a ﬁnite integral if a > 0,and the distribution itself is ﬁnite if a 1. It is plotted, for various values of a andb, in Figure 2.13. The mean and variance of the gamma distribution are given byE[λ] =var[λ] =aba.b2(2.147)(2.148)Consider a prior distribution Gam(λ|a0 , b0 ). If we multiply by the likelihoodfunction (2.145), then we obtain a posterior distributionNλa0 −1 N/22p(λ|X) ∝ λλexp −b0 λ −(xn − µ)(2.149)2n=1which we recognize as a gamma distribution of the form Gam(λ|aN , bN ) whereaN= a0 +bN= b0 +N2N12n=1(2.150)(xn − µ)2 = b0 +N 2σ2 ML(2.151)2is the maximum likelihood estimator of the variance.

Note that in (2.149)where σMLthere is no need to keep track of the normalization constants in the prior and thelikelihood function because, if required, the correct coefﬁcient can be found at theend using the normalized form (2.146) for the gamma distribution.2.3. The Gaussian DistributionSection 2.2101From (2.150), we see that the effect of observing N data points is to increasethe value of the coefﬁcient a by N/2. Thus we can interpret the parameter a0 inthe prior in terms of 2a0 ‘effective’ prior observations.

Similarly, from (2.151) we22see that the N data points contribute N σML/2 to the parameter b, where σMListhe variance, and so we can interpret the parameter b0 in the prior as arising fromthe 2a0 ‘effective’ prior observations having variance 2b0 /(2a0 ) = b0 /a0 . Recallthat we made an analogous interpretation for the Dirichlet prior. These distributionsare examples of the exponential family, and we shall see that the interpretation ofa conjugate prior in terms of effective ﬁctitious data points is a general one for theexponential family of distributions.Instead of working with the precision, we can consider the variance itself.

Theconjugate prior in this case is called the inverse gamma distribution, although weshall not discuss this further because we will ﬁnd it more convenient to work withthe precision.Now suppose that both the mean and the precision are unknown. To ﬁnd aconjugate prior, we consider the dependence of the likelihood function on µ and λ1/2N λλ2p(X|µ, λ) =exp − (xn − µ)2π2n=1NNN2λµλ∝λ1/2 exp −exp λµxn −x2n .22n=1(2.152)n=1We now wish to identify a prior distribution p(µ, λ) that has the same functionaldependence on µ and λ as the likelihood function and that should therefore take theformβλµ21 /2exp {cλµ − dλ}p(µ, λ) ∝ λ exp −2 βλc22β/2λ(2.153)= exp − (µ − c/β) λ exp − d −22βwhere c, d, and β are constants. Since we can always write p(µ, λ) = p(µ|λ)p(λ),we can ﬁnd p(µ|λ) and p(λ) by inspection. In particular, we see that p(µ|λ) is aGaussian whose precision is a linear function of λ and that p(λ) is a gamma distribution, so that the normalized prior takes the formp(µ, λ) = N (µ|µ0 , (βλ)−1 )Gam(λ|a, b)(2.154)where we have deﬁned new constants given by µ0 = c/β, a = 1 + β/2, b =d−c2 /2β.

The distribution (2.154) is called the normal-gamma or Gaussian-gammadistribution and is plotted in Figure 2.14. Note that this is not simply the productof an independent Gaussian prior over µ and a gamma prior over λ, because theprecision of µ is a linear function of λ. Even if we chose a prior in which µ and λwere independent, the posterior distribution would exhibit a coupling between theprecision of µ and the value of λ.1022. PROBABILITY DISTRIBUTIONSFigure 2.14Contour plot of the normal-gammadistribution (2.154) for parametervalues µ0 = 0, β = 2, a = 5 andb = 6.2λ 10−2Exercise 2.450µ2In the case of the multivariate Gaussian distribution N x|µ, Λ−1 for a Ddimensional variable x, the conjugate prior distribution for the mean µ, assumingthe precision is known, is again a Gaussian.

For known mean and unknown precisionmatrix Λ, the conjugate prior is the Wishart distribution given by1(ν−D−1)/2−1exp − Tr(W Λ)(2.155)W(Λ|W, ν) = B|Λ|2where ν is called the number of degrees of freedom of the distribution, W is a D ×Dscale matrix, and Tr(·) denotes the trace. The normalization constant B is given byB(W, ν) = |W|−ν/2−1Dν+1−i2νD/2 π D(D−1)/4Γ.2(2.156)i=1Again, it is also possible to deﬁne a conjugate prior over the covariance matrix itself,rather than over the precision matrix, which leads to the inverse Wishart distribution, although we shall not discuss this further.

If both the mean and the precisionare unknown, then, following a similar line of reasoning to the univariate case, theconjugate prior is given byp(µ, Λ|µ0 , β, W, ν) = N (µ|µ0 , (βΛ)−1 ) W(Λ|W, ν)(2.157)which is known as the normal-Wishart or Gaussian-Wishart distribution.2.3.7 Student’s t-distributionSection 2.3.6Exercise 2.46We have seen that the conjugate prior for the precision of a Gaussian is givenby a gamma distribution. If we have a univariate Gaussian N (x|µ, τ −1 ) togetherwith a Gamma prior Gam(τ |a, b) and we integrate out the precision, we obtain themarginal distribution of x in the form1032.3.

The Gaussian DistributionFigure 2.15Plot of Student’s t-distribution (2.159)0.5for µ = 0 and λ = 1 for various valuesof ν. The limit ν → ∞ correspondsto a Gaussian distribution with mean 0.4µ and precision λ.ν→∞ν = 1.0ν = 0.10.30.20.10−5∞p(x|µ, a, b) =0==05N (x|µ, τ −1 )Gam(τ |a, b) dτ(2.158) τba e(−bτ ) τ a−1 τ 1/2exp − (x − µ)2 dτΓ(a)2π20 1/2 a2 −a−1/21(x − µ)bb+Γ(a + 1/2)Γ(a) 2π2∞where we have made the change of variable z = τ [b + (x − µ)2 /2].

By conventionwe deﬁne new parameters given by ν = 2a and λ = a/b, in terms of which thedistribution p(x|µ, a, b) takes the formSt(x|µ, λ, ν) =Exercise 2.47Exercise 12.24Γ(ν/2 + 1/2)Γ(ν/2)λπν1/2 1+λ(x − µ)2ν−ν/2−1/2(2.159)which is known as Student’s t-distribution.

The parameter λ is sometimes called theprecision of the t-distribution, even though it is not in general equal to the inverseof the variance. The parameter ν is called the degrees of freedom, and its effect isillustrated in Figure 2.15. For the particular case of ν = 1, the t-distribution reducesto the Cauchy distribution, while in the limit ν → ∞ the t-distribution St(x|µ, λ, ν)becomes a Gaussian N (x|µ, λ−1 ) with mean µ and precision λ.From (2.158), we see that Student’s t-distribution is obtained by adding up aninﬁnite number of Gaussian distributions having the same mean but different precisions. This can be interpreted as an inﬁnite mixture of Gaussians (Gaussian mixtureswill be discussed in detail in Section 2.3.9.

The result is a distribution that in general has longer ‘tails’ than a Gaussian, as was seen in Figure 2.15. This gives the tdistribution an important property called robustness, which means that it is much lesssensitive than the Gaussian to the presence of a few data points which are outliers.The robustness of the t-distribution is illustrated in Figure 2.16, which compares themaximum likelihood solutions for a Gaussian and a t-distribution. Note that the maximum likelihood solution for the t-distribution can be found using the expectationmaximization (EM) algorithm.

Here we see that the effect of a small number of1042. PROBABILITY DISTRIBUTIONS0.50.50.40.40.30.30.20.20.10.10−50510(a)0−50510(b)Figure 2.16 Illustration of the robustness of Student’s t-distribution compared to a Gaussian. (a) Histogramdistribution of 30 data points drawn from a Gaussian distribution, together with the maximum likelihood ﬁt obtained from a t-distribution (red curve) and a Gaussian (green curve, largely hidden by the red curve). Becausethe t-distribution contains the Gaussian as a special case it gives almost the same solution as the Gaussian.(b) The same data set but with three additional outlying data points showing how the Gaussian (green curve) isstrongly distorted by the outliers, whereas the t-distribution (red curve) is relatively unaffected.outliers is much less signiﬁcant for the t-distribution than for the Gaussian.

Outlierscan arise in practical applications either because the process that generates the datacorresponds to a distribution having a heavy tail or simply through mislabelled data.Robustness is also an important property for regression problems. Unsurprisingly,the least squares approach to regression does not exhibit robustness, because it corresponds to maximum likelihood under a (conditional) Gaussian distribution. Bybasing a regression model on a heavy-tailed distribution such as a t-distribution, weobtain a more robust model.If we go back to (2.158) and substitute the alternative parameters ν = 2a, λ =a/b, and η = τ b/a, we see that the t-distribution can be written in the form ∞N x|µ, (ηλ)−1 Gam(η|ν/2, ν/2) dη.(2.160)St(x|µ, λ, ν) =0We can then generalize this to a multivariate Gaussian N (x|µ, Λ) to obtain the corresponding multivariate Student’s t-distribution in the form ∞N (x|µ, (ηΛ)−1 )Gam(η|ν/2, ν/2) dη.(2.161)St(x|µ, Λ, ν) =0Exercise 2.48Using the same technique as for the univariate case, we can evaluate this integral togive2.3.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.