Bishop C.M. Pattern Recognition and Machine Learning (2006) (811375), страница 41

Файл №811375 Bishop C.M. Pattern Recognition and Machine Learning (2006) (Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf) 41 страницаBishop C.M. Pattern Recognition and Machine Learning (2006) (811375) страница 412020-08-252020-08-25СтудИзба

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 41)

Although the resulting integral over w is no longer analytically tractable,it might be thought that approximating this integral, for example using the Laplaceapproximation discussed (Section 4.4) which is based on a local Gaussian approximation centred on the mode of the posterior distribution, might provide a practicalalternative to the evidence framework (Buntine and Weigend, 1991). However, theintegrand as a function of w typically has a strongly skewed mode so that the Laplaceapproximation fails to capture the bulk of the probability mass, leading to poorer results than those obtained by maximizing the evidence (MacKay, 1999).Returning to the evidence framework, we note that there are two approaches thatwe can take to the maximization of the log evidence.

We can evaluate the evidencefunction analytically and then set its derivative equal to zero to obtain re-estimationequations for α and β, which we shall do in Section 3.5.2. Alternatively we use atechnique called the expectation maximization (EM) algorithm, which will be discussed in Section 9.3.4 where we shall also show that these two approaches convergeto the same solution.3.5.1 Evaluation of the evidence functionThe marginal likelihood function p(t|α, β) is obtained by integrating over theweight parameters w, so that(3.77)p(t|α, β) = p(t|w, β)p(w|α) dw.Exercise 3.16Exercise 3.17One way to evaluate this integral is to make use once again of the result (2.115)for the conditional distribution in a linear-Gaussian model.

Here we shall evaluatethe integral instead by completing the square in the exponent and making use of thestandard form for the normalization coefﬁcient of a Gaussian.From (3.11), (3.12), and (3.52), we can write the evidence function in the formp(t|α, β) =β2πN/2 α M/22πexp {−E(w)} dw(3.78)3.5. The Evidence Approximation167where M is the dimensionality of w, and we have deﬁnedE(w) = βED (w) + αEW (w)βα2=t − Φw + wT w.22Exercise 3.18(3.79)We recognize (3.79) as being equal, up to a constant of proportionality, to the regularized sum-of-squares error function (3.27). We now complete the square over wgiving1E(w) = E(mN ) + (w − mN )T A(w − mN )(3.80)2where we have introduced(3.81)A = αI + βΦT Φtogether withβα2t − ΦmN + mTmN .(3.82)22 NNote that A corresponds to the matrix of second derivatives of the error functionE(mN ) =A = ∇∇E(w)(3.83)and is known as the Hessian matrix.

Here we have also deﬁned mN given bymN = βA−1 ΦT t.Exercise 3.19(3.84)1Using (3.54), we see that A = S−N , and hence (3.84) is equivalent to the previousdeﬁnition (3.53), and therefore represents the mean of the posterior distribution.The integral over w can now be evaluated simply by appealing to the standardresult for the normalization coefﬁcient of a multivariate Gaussian, givingexp {−E(w)} dw1T= exp{−E(mN )} exp − (w − mN ) A(w − mN ) dw2= exp{−E(mN )}(2π)M/2 |A|−1/2 .(3.85)Using (3.78) we can then write the log of the marginal likelihood in the formln p(t|α, β) =MNN1ln α +ln β − E(mN ) − ln |A| −ln(2π)2222(3.86)which is the required expression for the evidence function.Returning to the polynomial regression problem, we can plot the model evidenceagainst the order of the polynomial, as shown in Figure 3.14.

Here we have assumeda prior of the form (1.65) with the parameter α ﬁxed at α = 5 × 10−3 . The formof this plot is very instructive. Referring back to Figure 1.4, we see that the M = 0polynomial has very poor ﬁt to the data and consequently gives a relatively low value1683. LINEAR MODELS FOR REGRESSIONFigure 3.14Plot of the model evidence versusthe order M , for the polynomial re- −18gression model, showing that theevidence favours the model with−20M = 3.−22−24−2602468Mfor the evidence. Going to the M = 1 polynomial greatly improves the data ﬁt, andhence the evidence is signiﬁcantly higher.

However, in going to M = 2, the dataﬁt is improved only very marginally, due to the fact that the underlying sinusoidalfunction from which the data is generated is an odd function and so has no even termsin a polynomial expansion. Indeed, Figure 1.5 shows that the residual data error isreduced only slightly in going from M = 1 to M = 2. Because this richer modelsuffers a greater complexity penalty, the evidence actually falls in going from M = 1to M = 2. When we go to M = 3 we obtain a signiﬁcant further improvement indata ﬁt, as seen in Figure 1.4, and so the evidence is increased again, giving thehighest overall evidence for any of the polynomials.

Further increases in the valueof M produce only small improvements in the ﬁt to the data but suffer increasingcomplexity penalty, leading overall to a decrease in the evidence values. Lookingagain at Figure 1.5, we see that the generalization error is roughly constant betweenM = 3 and M = 8, and it would be difﬁcult to choose between these models onthe basis of this plot alone. The evidence values, however, show a clear preferencefor M = 3, since this is the simplest model which gives a good explanation for theobserved data.3.5.2 Maximizing the evidence functionLet us ﬁrst consider the maximization of p(t|α, β) with respect to α. This canbe done by ﬁrst deﬁning the following eigenvector equation T βΦ Φ ui = λi ui .(3.87)From (3.81), it then follows that A has eigenvalues α + λi .

Now consider the derivative of the term involving ln |A| in (3.86) with respect to α. We have 1d ddln(λi + α) =ln |A| =ln (λi + α) =.(3.88)dαdαdαλi + αiiiThus the stationary points of (3.86) with respect to α satisfy11 1M− mT.0=N mN −2α 22λi + αi(3.89)3.5. The Evidence ApproximationMultiplying through by 2α and rearranging, we obtain 1= γ.αmTN mN = M − αλi + α169(3.90)iSince there are M terms in the sum over i, the quantity γ can be written λiγ=.α + λi(3.91)iExercise 3.20The interpretation of the quantity γ will be discussed shortly. From (3.90) we seethat the value of α that maximizes the marginal likelihood satisﬁesγα= T.(3.92)mN mNNote that this is an implicit solution for α not only because γ depends on α, but alsobecause the mode mN of the posterior distribution itself depends on the choice ofα.

We therefore adopt an iterative procedure in which we make an initial choice forα and use this to ﬁnd mN , which is given by (3.53), and also to evaluate γ, whichis given by (3.91). These values are then used to re-estimate α using (3.92), and theprocess repeated until convergence.

Note that because the matrix ΦT Φ is ﬁxed, wecan compute its eigenvalues once at the start and then simply multiply these by β toobtain the λi .It should be emphasized that the value of α has been determined purely by looking at the training data. In contrast to maximum likelihood methods, no independentdata set is required in order to optimize the model complexity.We can similarly maximize the log marginal likelihood (3.86) with respect to β.To do this, we note that the eigenvalues λi deﬁned by (3.87) are proportional to β,and hence dλi /dβ = λi /β givingdd γ1 λiln |A| == .(3.93)ln(λi + α) =dβdββλi + αβiiThe stationary point of the marginal likelihood therefore satisﬁes0=N2N1 γ−tn − mT−N φ(xn )2β22β(3.94)n=1Exercise 3.22and rearranging we obtainN211 =tn − m T.N φ(xn )βN −γ(3.95)n=1Again, this is an implicit solution for β and can be solved by choosing an initialvalue for β and then using this to calculate mN and γ and then re-estimate β using(3.95), repeating until convergence.

If both α and β are to be determined from thedata, then their values can be re-estimated together after each update of γ.1703. LINEAR MODELS FOR REGRESSIONFigure 3.15 Contours of the likelihood function (red)and the prior (green) in which the axes in parameterspace have been rotated to align with the eigenvectorsui of the Hessian. For α = 0, the mode of the posterior is given by the maximum likelihood solution wML ,whereas for nonzero α the mode is at wMAP = mN . Inthe direction w1 the eigenvalue λ1 , deﬁned by (3.87), issmall compared with α and so the quantity λ1 /(λ1 + α)is close to zero, and the corresponding MAP value ofw1 is also close to zero. By contrast, in the direction w2the eigenvalue λ2 is large compared with α and so thequantity λ2 /(λ2 +α) is close to unity, and the MAP valueof w2 is close to its maximum likelihood value.w2u2wMLwMAPu1w13.5.3 Effective number of parametersThe result (3.92) has an elegant interpretation (MacKay, 1992a), which providesinsight into the Bayesian solution for α.

Характеристики

Тип файла

PDF-файл

Размер

9,37 Mb

Материал

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Тип материала

Книга

Предмет

(ММО) Методы машинного обучения

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

bishop-c.m.-pattern-recognition-and-machine-learning-2006.pdf.rar

Bishop C.M. Pattern Recognition and Machine Learning (2006).pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.